10 upgrades to the original Ben Eater architecture without adding more breadboards

tags: `8bit`

So you have finished your Ben Eater breadboard computer build? And now you want to learn a little more, add a little more power, and write a bit more ambitious programs? Here are 10 improvements to the original design that are relatively easy to do, require only a few more ICs, and will all fit on the original 2x7 breadboards.

I suggest making these improvements in the order they are presented here. The description of each improvement assumes that the previous improvements have been implemented.

Origins of these improvements

This document got started because I decided to source most of my components from a local electronics shop for my build. This meant the exact ICs that Ben Eater used were not always available and I sometimes had to use different ICs, notably:

Instead of the two 74189 16-byte RAM ICs, I got a single GM76C28A-10 2k RAM IC.
Instead of the three 28C16 2k EEPROM ICs, I got three AT28C64B 8k EEPROM ICs.

Initially, I made the build as close to Ben Eater's as I could get it. However, the RAM module had to be re-designed because the GM76C28A-10 RAM IC uses combined input/output data lines, where the original design calls for separate input and output data lines.

After getting the original design (mostly) working I started planning some enhancements with Joost Vromen and Werner van Ipenburg to leverage the extra address pins on my replacement ICs. Some of them have already been implemented, others are on the to-do list. An assembler and simulator for the final machine as envisioned in this document can be found here: https://github.com/wmvanvliet/8bit/tree/final

Some upgrades were inspired by these Reddit threads:

and this Arduino project:

https://create.arduino.cc/projecthub/david-hansel/breadboard-computer-programmer-1e7a09

1. Multiplexed RO/IO/CO/AO/EO control lines

Required components: Some hook-up wire

The machine uses EEPROMs for the control logic, containing microcode. The flexibility created by using microcode instead of a hard-wired logic circuit is a huge boost to the system as a tool for exploration of CPU design, allowing us to add new features fast (for inspiration, see the rest of this document).

However, EEPROMs do not change state instantaneously. Whenever the address lines on the two microcode EEPROM ICs change, we have a period (150ns in my case) where the EEPROM outputs are undefined. This means that during this time, the control lines that are connected to these outputs are bouncing around.

To deal with this, the general design of the machine is to set the control lines on the down-flank of the clock, and have the system "listen" to them only on the up-flank. Except we don't. Since the outputs of the instruction and flags registers are directly hooked-up to the address pins of the EEPROMs, the control lines will change as soon as these registers change, which is on the up-flank of the clock. This is not good, as the control lines are bouncing around while the different modules are actively listening to them.

When multiple modules start writing to the bus simultaneously, you get bus fighting and a spike in power consumption. Add in the poor power distribution across the breadboards, and this spike wreaks havoc on the RST, HLT, and OI lines. With better power management, the spike may be handled well enough to not cause problems, but it would be nice to prevent bus fighting altogether.

The solution is to take the 3-8 decoder used to reset the microstep counter and place it on the control line board instead (or use a new one, but in improvement 3 we will remove the existing 3-8 decoder anyway).

Hook up the last 3 outputs of the microcode EEPROM, controlling the 3 least-significant bits of the control word, to the 3 inputs of the decoder. The first output of the decoder is low when all 3 inputs are low, and we keep this one unconnected for the "no output control lines are active" condition. We use the other outputs of the decoder to drive the RO, IO, CO, AO and EO lines. The existing hex-inverter ICs are used to invert the signal when necessary, so the proper signal is send over the control line and to the indicator LED. The decoder makes sure only one output line is pulled low at any time, so no more bus fighting!

Of course, we must update our microcode to know about the new way of controlling the output control lines:

#define RO 0b0000000000000001
#define IO 0b0000000000000010
#define CO 0b0000000000000011
#define AO 0b0000000000000100
#define EO 0b0000000000000101

2. Latching the microcode EEPROM address

Required components: one 74LS273 8-bit register IC, some hook-up wire

We might have prevented bus-fighting during the bouncing of the control signals, but the instruction and flags registers are still violating the basic design of only changing control lines on the down-flank of the clock. We can address this with an extra 74LS273 8-bit register. There is plenty of space for it on the breadboard below the instruction register.

Instead of hooking up the instruction register to the EEPROM address pins directly, it is hooked up to 4 of the inputs of the 74LS273 8-bit register. The flags register is hooked up to an additional 2 inputs of this register. Finally, the outputs of the 74LS273 are hooked up to the EEPROM address pins. The output-enable pin of the 74LS273 is tied high and the input-enable pin is tied to the inverted clock signal.

Now, the EEPROM address pins only ever change on the down-flank of the clock and the system should be much more stable!

3. More efficient microcode

Required components: Some hook-up wire

Every instruction currently takes a full 5 microsteps. However, we have freed up two EEPROM outputs, and the decoder also still has two free outputs. Let's put them to good use! We can use the final output of the decoder as a control line (SR), so we can reset the microstep counter from the microcode.

In terms of hardware, my first thought was to connect the SR control line to the same 74LS00 NAND gate input where the original reset line had been. However, this causes a problem, as the control lines bounce every time the EEPROM address changes, causing random resets.

A better strategy is to connect the SR control line to the "input enable" pin of the 74LS161 4-bit counter (see /u/Positive_Pie6876's Reddit post). A reset is performed by making the counter IC read from its 4 input pins, which we all tie to ground. Crucially, the 74LS161 will only perform the read on a clock pulse, in this case the down-flank of the clock (remember that the clock pin is connected to the inverted clock signal).

To make the physical reset button also reset the microstep counter, you can connect the "inverted reset" signal that was connected to pin 1 of the 74LS00 directly to the reset pin of the 74LS161.

For example, a NOP instruction will be:

#define SR 0b0000000000000111
...
CO|MI, RO|II|CE|SR, 0, 0, 0, 0, 0, 0

taking only 2 microsteps. Note that setting the SR signal high produces a reset on the next down flank of the clock.

4. Adding an Arduino Nano as bootloader

Required components: An Arduino Nano board (or clone), some hook-up wire

When the system is in a stable configuration, we can start thinking about some more ambitious upgrades.

The modified SAP-1 design by Ben Eater is genius in that it resembles a plateau point where, in order to add more functionality in a meaningful way, a big jump in terms of hardware is needed. The main limitation is the need to program the machine through DIP switches. These switches provide the most simple to understand way to provide input to the machine, so designing the system around them makes sense. There is no need for more than 16 bytes of RAM, because programming larger programs using the switches becomes tiresome. There is no need for additional instructions, because the kinds of programs you can write using 16 bytes of RAM will not fundamentally change with a more elaborate instruction set. Hence, in order to meaningfully add more functionality to the machine, we first need to overcome the need for programming through the DIP switches.

I think that the easiest way to achieve this is to place an Arduino Nano on the board. This is kind of a cheat, as it means embedding a more powerful CPU into our breadboard CPU. A better alternative may be to attach an SD-card reader instead and use an EEPROM to bit-bang the SPI protocol, but I don't know how feasible this is. So, I'm fine with the Arduino hack for now.

The idea is to make the arduino disable the EEPROMs upon power-on by setting their CE pin high, and take control over some control lines. Hook up the following pins:

D2-D9 to the bus
D10 to MI control line
D11 to R1 control line
A1 to EEPROM CE pins
A2 to SR control line
A3 to the clock (configured as input)

Make the arduino listen to the clock (pin A3), write to the bus, and use the MI and RI lines to store the value on the bus in the memory address register and RAM. After the program has finished loading, disconnect from the bus and control lines (set the pins to INPUT to set them to high-impedance mode), bring the EEPROM CE pin low and the SR line high.

5. Expanding to 256 bytes of RAM

Required components:

1 Parallel SRAM memory IC like the GM76C28A-10
1 8-bit DIP switch
1 74LS32 quad OR gate
1 74LS157 line multiplexer
1 74LS161 4-bit counter
1 74LS173 4-bit register
1 74LS245 bus transceiver
4 Yellow LEDs
7 Green LEDs
11 220 Ohm resistors
Plenty of hook-up wire
In addition the to above, we will reuse the 74LS245 (bus transceiver), 74LS157 (2-input mux), 74LS04 (inverter), 8-bit DIP switch, toggle switch and push button from the existing build.

Admittedly, this modification is not really easy and is quite a drastic change in terms of hardware, but in my opinion completely worth it. While our memory address and program counter registers only have 4 bits, our data bus is 8 bits, so in theory we could use 8-bit addressing to get access to a whopping 256 bytes of memory. The expansion can be achieved by first doubling up the memory address DIP switch, the 74LS157 line multiplexer, 74LS173 memory address register and 74LS161 program counter. Next, the RAM chip needs to upgraded to something larger. This means that in all likelihood, the upgraded RAM chip uses the same data-lines for both input and output. The circuit must be re-arranged to take this into account, and that is the most complex change we'll have to make.

There is an excellent guide by /u/MironV that provides detailed instructions on how to do it. However, one downside of this design is that it removes the LEDs showing the contents of the RAM. This was a no-go for me, so I created my own design that uses an additional 74LS245 chip, but keeps the LEDs intact:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Using 8-bit memory addresses also means we have to change the microcode. The great thing about 4-bit addresses was that we could pack a 4-bit instruction and a 4-bit address together in the 8-bit instruction register. But no more. An instruction involving a memory address will now take up 2 bytes of memory and the microcode will have to increase the program counter twice during its execution. For example, here is the modified LDA instruction:

MI|CO, RO|II|CE, MI|CO, MI|RO|CE, RO|AI|SR, 0, 0, 0

Note how we read from RAM three times: once to load the instruction, once more to load the parameter of the instruction, and finally once to load the contents of the requested memory address into the A-register. We've lost some speed, but gained a lot of memory in return!

256 bytes of RAM allows you to write programs such as computing the square root.

6. Expanding to 512 bytes of RAM: separate program and data memory segments

Required components: 4-bit DIP switch, hook-up wire

After all of this, we still have three control lines free (two EEPROM outputs and one decoder output). Let's call one of the free EEPROM outputs the "Segment Select" (SS) line and hook it up to the RAM IC as an extra address line. With 9 address lines, we have 512 of addressable bytes in theory, but our bus is only 8 bits wide, so addressing will have to work a little differently.

Addresses 0–255 (SS line low) are dedicated to the programming code and will be read-only (the code segment). Any instructions that read from or write to a memory address will have the SS line high and thus operate on addresses 256–511 (the data segment). Any jump instructions will keep the SS line low and thus jump to addresses in the 0–255 range, which should only contain programming code. Downside: no more self-modifying code. Upside: even more memory!

To use the 9-th address line during programming, you can hook-up a DIP switch to the 74LS157 multiplexer on the programming board. If you've followed my schematics of the RAM upgrade, there should be a multiplexer free on the IC.

6. Hook up the BO line to unlock call/ret instructions

With 512 bytes of RAM, we can write more ambitious programs. To structure them, it sure would be nice to have subroutines!

To call a subroutine means to jump to the start of the subroutine (easy) and, at the end of the subroutine, returning back to the place where we called the subroutine from (hard). Since a subroutine can be called from different places, the return address cannot be hard-coded. Instead, whenever we call a subroutine, we need to store the return address somewhere. From a hardware perspective, storing the return address at memory location 256 (address 0 with the SS line high) is most convenient, since we can force a 0 into the memory address register by having it read from the bus with nothing currently writing to the bus. Figuring out what the return address should be is harder.

At the end of the subroutine, the ret instruction should jump to the instruction after the original call instruction, or we will have an infinite loop. Here is an example program that calls a subroutine to display the value 42:

In assembly language:

    call subroutine
    hlt
subroutine:
    ldi 42
    out
    ret

Compiled into binary code:

;address    content
00000000   00001001   ; opcode for CALL instruction
00000001   00000011   ; address to jump to: start of subroutine
00000010   00001111   ; HLT instruction, subroutine should return here
00000011   00000110   ; start of subroutine: LDI
00000100   00101010   ; value to load into A register: 42
00000111   00001110   ; opcode for OUT instruction
00001000   00001010   ; opcode for RET instruction

When we initiate the call instruction, we are at memory address 00000000. The address we want to jump to, the start of the subroutine, is given as parameter to the call instruction and placed at address 000000001 (remember that our memory addresses are now 8-bits, so we cannot pack them alongside the opcode anymore). The program continues (with a hlt instruction) at address 00000010, so this should be the return address.

Conveniently, the program counter will contain the right return address if we keep incrementing it as we read the call instruction and its parameter. Inconveniently, performing a jump will overwrite the program counter. So, the order in which things need to happen is:

Read the opcode, increase program counter
Read the parameter, increase program counter
Write program counter to memory address 256 (0 and SS line high)
Perform the jump

If you try to compose the microcode for this in your head, you might notice the problem: we need to temporarily store the parameter somewhere while we are writing the return address to the RAM.

The B-register is the perfect place for this. After all, we are already using it to temporary hold values during addition and subtraction. We just need to hook up the BO control line in order to get a value out of it without using the ALU. Luckily, we still have a line free on our decoder. And with that, we can write the microcode for call and ret:

MI|CO, RO|II|CE, MI|CO, RO|BI|CE,   MI, MP|CO|RI, BO|J|SR, 0   // 1001 - CALL
MI|CO, RO|II|CE, MI,    MP|RO|J|SR, 0,  0,        0,       0   // 1010 - RET

We can only store a single return value, so we cannot perform a "nested" call, but even with this limitation, its a convenient thing to have when writing larger programs.

8. Turning it up to 11: virtual registers

At this point, opcodes are still 4-bits: 4 lines drawn from the instruction register, through an 74LS273 buffer register, to the EEPROM address pins. But the EEPROMs I used have 13 address lines, which means we could use all 8 bits of the instruction register:

FLAGS -----OPCODE----- MICROSTEP
12 11 10 9 8 7 6 5 4 3 2 1 0

Now all 8 bits of the instruction register are hooked up to the inputs of the 74LS273 register, filling it up completely. So we need another 74LS173 as a buffer for the flags. Plenty of room on the board for that. And with that, we have 256 possible opcodes! Let's put them to good use.

Our memory expansion has come at a terrible cost: many instructions now take up two whole bytes (gasp!) and take more microsteps to execute. We can use our abundance in opcodes to offset this somewhat by designing "virtual registers".

With our low clock speeds, reading from RAM is just as fast as reading from the A or B register. This means that we can use the RAM to simulate additional registers at little extra cost. Since memory address 256 (0 with SS line high) is already reserved for the return address, lets assign addresses 257-263 (1-7 with SS high) for "virtual" registers: b,c,d,e,f,g,h. The hardware B-register is never used as a general purpose register and is hereby renamed the "temp" register. The hardware A-register will remain the a register and will be nicknamed the "accumulator".

To read from and write to a virtual register, we need the microcode to be able to write its memory address to the bus somehow. Luckily, we still have the final 4 bits of the instruction register hooked up to the bus from way back in the original design. Let's remove one line and just keep the final 3 bits attached to the bus, matching our 8 registers. Now, all we have to do is make sure that the opcode for any instruction using the virtual "b" register ends with 001, any instruction using the virtual "c" register ends with 010, and so forth, so setting the IO line high will put the correct address on the bus.

Here are the opcodes for loading the contents of a memory address into a register:

00001000 - LDA
00001001 - LDB
00001010 - LDC
00001011 - LDD
00001100 - LDE
00001101 - LDF
00001110 - LDG
00001111 - LDH

You can also look at this as a single LD command with opcode 00001 with the register as a 3-bit parameter packed alongside the opcode. So now our instruction set will sometimes have parameters packed alongside the opcode, sometimes parameters on the following memory address, and sometimes both. For example, here is the command for loading the value 42 into virtual register e :

00010100 - LD immediate into E
00101010 - 42

Here is the corresponding microcode to execute it:

MI|CO, RO|II|CE, MI|CO, BI|RO|CE, IO|MI, BO|MP|RI, SR, 0  - 00010 LDI

(notice how we use the hardware B"temp" register as a temporary storage location again)

9. Working with numbers larger than 8 bits

Use the last available EEPROM output to have a separate control line for the XOR gates in the ALU, and the carry-in to the adders. Using this, you can create ADC and SBC instructions, which perform "add with carry" and "subtract with carry". Now you can add and subtract numbers larger than 8 bit by processing them byte-by-byte.

10. Designing an orthogonal assembly instruction set

Even when using the final 3 bits as a parameter for some instructions, we have room for a lot of opcodes. Given our restricted memory, it makes sense to implement a large number of them, where each opcode can do a lot of work (a CISC design). A good way forward is to try and implement an orthogonal instruction set, meaning that all instructions can deal with any type of parameter, whether that be a register, direct value or memory address.

My instruction set is modeled after that of the Z80:

LD   #,## load something into something
ADD  # add something to the accumulator
SUB  # subtract something from the accumulator
ADC  # add something to the accumulator along with the carry flag
SBC  # subtract something from the accumulator along with the carry flag
CP   # compare something with the accumulator, only set flags
JP   [C|Z|NC|NZ],# jump to somewhere, possibly with a condition
DJNZ # Decrease accumulator, jump if accumulator is non-zero (useful for loops)
JSR  # jump to subroutine
RET    return from subroutine
OUT  # output something
HLT    stop the program, halt the machine

where # and ## can be a register, a memory address or an immediate value. For example, here are all the possible variations of the ld instruction:

ld a,42  ; Load immediate value into register
ld b,42
...
ld h,42

ld a,b  ; Load one register into another
ld b,a
...
ld h,f

ld a,[my_label]  ; Load from RAM into a register
ld b,[my_label]
...
ld h,[my_label]

ld [my_label],a  ; Load a register into RAM
ld [my_label],b
...
ld [my_label],h

ld [my_label],42 ; Load an immediate value into RAM
ld [my_label],[my_other_label] ; Copy between two RAM locations

ld a,[b]         ; Load from RAM location pointed to by regiser

Note the difference between my_label and [my_label]. The former indicates the memory address of a label, which translates into an immediate value as a parameter to the opcode. The latter means the contents of the RAM at the memory address of the label, which translates into an address as a parameter to the opcode.

Here are the jp variations:

jp my_label     ; Jump to address
jp c,my_label   ; Jump to address on carry
jp z,my_label   ; Jump to address on zero
jp nc,my_label  ; Jump to address on not-carry
jp nz,my_label  ; Jump to address on not-zero

jp a            ; Jump to address in register
jp c,a          ; Jump to address in register on carry
...
jp nz,h         ; Jump to address in register on not-zero

jp [my_label]   ; Indirect jump (jump to address stored in RAM)
jp c,[my_label] ; Indirect jump on carry
...
jp nz,[my_label]

10 upgrades to the original Ben Eater architecture without adding more breadboards

tags: 8bit