BMOW title
Floppy Emu banner

Nibbler Refinements


I’ve made a few refinements to the Nibbler design, and now I’m ready to start building the CPU! After looking at what seemed like a million different possible changes and additions, I’ve decided to keep the hardware exactly as I originally described it in my first post, with just a few minor changes:

  • Replace the ‘175 quad flip-flop with a ’74 dual flip-flop
  • Add a ’32 quad OR chip for glue logic
  • Replace the ALU’s Equal flag with a Zero flag
  • Replace the 2K x 8 RAM with a 4K x 4 RAM

The addition of the OR chip makes the last two changes possible. And although Nibbler will have one more chip than before, I believe it actually makes the overall design simpler to understand.

Flip-Flop: The ‘175 was used for the /RESET signal, and the Phase bit. Because I was only using two of the four FF’s on the ‘175, switching to the smaller package of the ’74 makes sense. The ’74 also has independent clear inputs for each FF, so now /RESET can force Phase to 0, which wasn’t possible with the ‘175.

Zero vs Equal: The original design had an Equal flag, which was set by the ALU’s A=B output as a result of the CMP instruction. This was OK, but a Zero flag is better. It’s set by any instruction that modifies the accumulator, as well as CMP. That makes it possible to do a LD, IN, or NOR, and follow it immediately with a conditional jump JZ or JNZ, without ever doing a CMP. That wasn’t possible with the Equal flag.

The Zero flag is generated by using three OR gates to OR together the 4 bits of the ALU result: /Zero = (F0+F1)+(F2+F3). I considered an alternative method, where the ALU is operated in active low mode, and the operands and output treat a low voltage as logical 1 and a high voltage as logical 0. This makes the ALU’s A=B output behave like a true Zero flag, but requires the use of some inverting buffers, and requires some staring at the datapath diagram and the datasheet before it’s clear why it works. The OR method is much more intuitive.

4-Bit RAM: My original plan called for an 8-bit RAM, with four of the data pins unused. That’s a little awkward, but because none of the 4-bit RAMs I found had a /OE output enable, they couldn’t be used with Nibbler as-is. 8-bit RAMs typically have a /OE pin, making them easier to work with. The reason is somewhat complicated. With only /CE and R/_W inputs on the 4-bit RAM, it’s not possible to enable it for appropriate instructions and gate the write-enable signal with the clock (necessary to prevent accidental writes) without external glue logic.

With the addition of the ’32 quad OR chip, I can use one OR gate for the necessary glue. The /CERAM control signal is OR’d with the clock, and connected to the RAM’s /CE input. This ensures that RAM is only enabled during the second half of the clock cycle, and only during clock cycles where the microcode wants it enabled. The /WERAM control signal is connected directly to the RAM’s R/_W input. If it’s enabled and it’s not writing, then the RAM functions in read mode, and drives a nibble onto the data bus.




New Instructions

What about possible new instructions? I’ve decided to keep things as simple as possible, and stick to 16 instructions, selected by the high 4 bits of a byte of program memory. Furthermore, I’m only going to consider instructions that can work with the current datapath and control path, and can be implemented solely by changing the microcode ROM contents. That rules out many of the possible new instructions that I discussed previously, but keeps everything much simpler.

So, exactly which 16 instructions will it be? I’m not sure yet, but I don’t have to be. I can update the contents of the microcode ROM after the CPU is built, based on my experience writing programs for it, to select the 16 instructions that prove to be most useful. However, 14 of those 16 are almost certain. The only real question is whether JNZ and JNC should be replaced with something else.



Incidentally, I never realized the double-entendre in “driving the data bus” until now. Maybe when I retire, I’ll buy an old yellow school bus, paint DATA BUS on the sides, and drive it to electronics shows shouting out the window “Look, I’m driving the data bus!” Fun times, guaranteed.


Read 11 comments and join the conversation 

11 Comments so far

  1. Hans Franke - September 9th, 2013 5:40 pm

    One last question still to be answered: Why a ‘181 at all ?

  2. goatboy - September 10th, 2013 2:30 pm

    Because making an ALU out of 74xx series logic would increase the chip count by about 200% presumably?

  3. Hans Franke - September 10th, 2013 3:30 pm

    But that’s not necersarry. It’s rather sbout replaceing the ‘181 by a 2716. A(4), B(4), Carry(1) and Function(2) as input (address) and C(4), Carry(1), Zero(1) and 2 ‘other’ results as output. Still just one chip with 24 pins and similar package, so exactly the same chip count, pin count and size requirement. But no hassles in figuring out how to creat Carry or Zero or what logic level to use or what function. Think of it as a full programmable gate array :))

    In fact (but that your require a bit of a change), the ALU could even be made a part of the Microcode.

    BTW, isn’t the CY7C167/8 series already on the canceled list?

  4. Steve Chamberlin - September 10th, 2013 7:49 pm

    That’s a good point. I suppose any purely-combinatorial 7400 part could be replaced by a ROM, although it would be slower and probably cost a little more. I wasn’t able to find any 2716’s from normal distributors, but I bought two from eBay specifically for this project.

  5. Ale - September 11th, 2013 12:52 am

    But why is it better to use a (EP)ROM instead of a ‘181 ?. (Some desktop calculators from HP like the HP9820 used 2 ROMs for the ALU to compute BCD arithmetic). I’d use a CPLD 🙂

  6. Hans Franke - September 11th, 2013 3:35 am

    @Steve: Every part of a determinable circuit can be replaced by an ((E)P)ROM, thus every (determinable) combination thereof can.. Any static function (thats where every output is defined by some or all inputs) can be replaced by a ROM with the same number of inputs (address lines) and outputs (data lines). The number of i/o lines can be reduced by encoding/decoding several lines that are exclusive to each other.

    If a function contains storage, one additinal input and one output per bit of storage is needed. In reality, several such storage bits would be combined into a state representation, sometimes (if the states are again exclusive) reduceing the need for state´lines. This happens whenever a given group of storage cells does not reach all combinations. For example if your circuitry contains a Zero and a Negative marker, they will only use 3 of the 4 combination possible. Find another pair like that and you’ll need just 3 lines to encode them as a global state, halving the ROM size, but adding a decoder – dependign on the number of lines, the decoder might again be a ROM.

    Regarding 2716 as storage for the microcode: in the good ol’days ™ 256×4 PROMs like 74S287 would have been used – 4 (or with some optimization in microcode) 3 little 16 Pin would replace your two big 27xx :)) – Also with a speedgrade of <30ns max, the Nibbler could reach a quite higher speed (at least compared to back then EPROMs with 300+ ns) – BTW, they are still available, so all you may miss is a suitable programmer 🙂

  7. Hans Franke - September 11th, 2013 4:08 am

    @Ale: ROMs can be as fast (if not faster than) a ‘181, but at the same time allow ANY possible output for a given input pattern. This enables the developer to squeeze in any logic/arithmetic operation needed, withotu schlepping the burden of unwanted operations.

    For the nibbler, Steve just uses three operations (ADD, NOR, CMP) but he needs to create 5 micro code outputs to controll the ALU. Using a (P)ROM only two would be needed (and even offer a 4th operation – maybe to be used instead one of the complementary jumps – like a shift right).

    With a (P)ROM based ALU, the Nibbler microcode word could be reduced by 3 to 13 Bits. With some signal reordering a reduction to 12 output signals could be reached, thus reducing the PROM needed to just three. If that on’t work, adding a 3 to 7 decoder (‘138) for the OEs will shrink to 12 Bits. This won’t reduce the pin count but replacing a S287 by some way cheaper ‘138 – With two 2:4 (single ‘139) it’ll even go down to 11 Bits, freeing one for additional fuctionality. With some carefuul planing, the ROM width could be reduced further to 8 bits. This wouldn’t realy save a chip, rather add at least one, but it would com to a significant cost reduction (at last back then :))

  8. Ale - September 12th, 2013 8:37 pm

    I understand that but with such criteria one could choose a small CPLD as ALU too… I rewrote the schematic in verilog but without using microcode, a bunch of logic is used instead. I still have to do some more debugging.

  9. Hans Franke - September 13th, 2013 1:22 am

    a) a CPLD isn’t much different from a PROM
    b) CPLDs wheren’t arround when HP did the calculators
    c) the Nibbler is supposed to be build of standard TTL (like) components. (E)PROMs are the only programmable devices allowed

  10. Hans Franke - September 14th, 2013 7:47 am

    I just figured out the missing instruction: WAIT

    I was thinking about how to implement a simple 4 way (pocket) calculator. Input handling did require quite some code. The first solution would have been some interrupt handling, but ther’s no subroutine fundtion. Second was a special kind of a hardwar induced switch – a bit like a vectored interrupt, but without a way to return (still quite useful). Still it would requre some additional hardware. So WAIT came up as a way to handle waiting for external signals.

    Wait will stop execution until a certain condition at the given port is met. This might simplify input routines quite a bit. I can think of three possible implementation (high level) for WAIT

    a) every cycle the port is read until a non zero value is returned
    WAIT #$0 – Continue if a non zero value is returned by port 0

    b) like a), but instead of routing thru, the port value is ANDed with the value in A, so only the port bits set in A will be checked
    WAIT #$0 – continue when one of these nuttons is pressed

    c) like b), but the operation performed is XOR, so changes either way could be detected
    IN #$0 – LD STATE could be used instead
    WAIT #$0 – Wait until any button change

    I’m not sure, but it might be possible to have the last WAIT cycle load the XORed Value into A, so further tests can be performed from there

    For Nibbler-B I did implement the instruction as a combination keeping the xored value in A and the new in B, thus further handling can be done form this values.

  11. Johannes Grad - September 27th, 2013 12:18 pm

    You probably saw the news reports about the carbon-nanotube computer built at Stanford. They built the whole thing with a single Turing-complete instruction: SUBNEG. That could be a good way to reduce the chip count even more and still have a Turing complete architecture. I wish I had time to try it myself 🙂

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.