BMOW title
Floppy Emu banner

Custom 4-Bit CPU Schematic and Control

Enough with the vague design talk – here’s the circuit schematic for the Nibbler 4-bit CPU! Click the image to zoom in to a full size view. The whole system fits on a single page, including the CPU itself and the I/O devices, so it’s easy to wrap your head around.

Except for RAM and ROM, all the chips shown here are common 7400 series parts. I haven’t selected a logic family yet, but most likely they’ll be 7400HC or 7400HCT, which require less power while offering similar speed to the more common 7400LS family.

Program Data

The parts on the schematic are arranged in the same relative positions as in the architecture diagram from my previous post. At the middle-right is the program ROM, where the currently running program is stored. This is an 8Kx8 EEPROM, but Nibbler’s address size only allows for 4K programs, so one of the address inputs is unused and is hard-wired to 0. Program memory is 8 bits wide, and so all 8 of the ROM’s I/O lines are used. Depending on the type of instruction, these may be 4 bits of instruction opcode and 4 bits of immediate operand, or 4 bits of instruction opcode and 4 bits of address, followed by 8 more bits of address. At the start of execution of each instruction, this program byte is loaded into the Fetch register.

The address of the program instruction that’s currently being executed is stored in the program counter. The PC consists of three ‘163 4-bit counters, chained together to make a 12 bit logical register. After most instructions, the PC will increment to point to the next instruction. For jump instructions, the PC can also be loaded with a new address.  The address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits).

Control and Microcode

At the top left of the schematic are the three chips pertaining to the execution of the current instruction. The Fetch register is a ‘377, an 8-bit register that holds the current instruction opcode in the high 4 bits and instruction or address data in the low 4 bits. ALU flags are stored in the 4-bit Flags register, a ‘173. There are only two flags, carry and equal, so two of the four bits are unused. The last chip in this group is a ‘175, a quad flip-flop. One flip-flop is used to synchronize the reset signal, and another is the Phase bit, which constantly toggles between 0 and 1 to indicate which of the two clock cycles of an instruction’s execution is currently underway. Fetch is loaded at the end of the clock cycle when Phase is 0. The other two flip-flops are unused.

With two chips that are only half-used, is there a way to combine the functions of the ‘173 and the ‘175 into a single chip? Probably not: flip-flops load data on every clock, but the ‘173 needs a load enable for the ALU flags.

The instruction opcode, ALU flags, and phase are combined to form a 7-bit address for the two microcode ROMs, shown at the mid-left. The output of the two ROMs constitutes the 16 control signals needed to orchestrate the behavior of all the other chips. The microcode is stored in two 2Kx8 EEPROMs, so four of the eleven address inputs on each ROM are unused and hard-wired to 0.

ALU Datapath

At the bottom-left of the schematic are the ‘181 ALU and the ‘173 accumulator register “A”. The ALU (arithmetic and logic unit) can perform any common arithmetic or logical operation on its two inputs. In this case, one input always comes from the accumulator, while the other is supplied from the data bus. The ALU result is stored back into the accumulator. The ALU, accumulator, and data bus are all 4 bits wide, which is what makes Nibbler a 4 bit CPU.

Carry-In and Carry Flag

If you look carefully, you’ll see that the ALU’s carry-in bit is a control signal provided by microcode, not the carry flag from the Flags register. This is a subtle but important point: the carry flag is an output from an arithmetic instruction, and can be used to make a conditional jump if the carry flag is/isn’t set, but it doesn’t feed back into the ALU to affect later calculations. This means that when performing multi-nibble pair-wise additions, the program must check the carry flag after each nibble addition, and add an extra 1 into the next addition if it’s set.

This was a conscious design choice. If the carry flag did connect to the ALU’s carry-in bit, then the program would need to clear it before performing any single-nibble additions, and those are much more common than multi-nibble additions. Also the carry-in bit can’t simply be hard-wired to 0, because as you’ll see later, the CMP (compare) instruction requires carry-in to be 1 in order to work properly. So carry-in must be provided by the microcode.


RAM is shown at the bottom-center. Its I/O lines are connected to the data bus, and the address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits). Ideally the system would use a 4Kx4 SRAM, to match Nibbler’s address size and data width, but the closest match I could readily find was a 2Kx8 SRAM. That means there will only be 2048 addressable nibbles instead of 4096, and half of the RAM I/O lines will be unused.

Notice the CLK signal is connected to the RAM’s /CE (chip enable) input. This means the RAM will only be enabled during the second half of each clock cycle. This is a simple way of preventing erroneous writes to RAM during the early part of the clock cycle, when the /WE (write enable) signal and RAM address may not yet be valid.

IN and OUT Ports

The IN and OUT ports are also connected to the data bus, and are shown on the schematic at bottom-right. IN0 is a ‘125 4-bit bus driver, which outputs the state of four pushbuttons connected to pull-up resistors. Because there’s only a single IN port, no decoding of the port number is done, and this ‘125 will actually respond to any port number with the IN instruction. If more IN ports were added, then additional port number decoding logic would be needed.

The two OUT ports are ‘173 4 bit registers. OUT1 connects to databus[4..7] of a 16×2 character LCD display using the common HD44780 controller. Although this LCD controller has an 8 bit interface, it can also operate in 4 bit mode, in which case only the highest 4 LCD databus lines are used. OUT0 connects two more lines to the LCD, for the RS and E signals needed to control LCD data transfers. The other two lines from OUT0 connect to an LED, which can be toggled on/off as a basic debugging aid, and to a speaker, which can be bit-banged in software to generate simple square-wave tones at different frequencies.

Notice that the ‘173s have two load enable inputs, /G1 and /G2, and both must be low in order to load data to the chip. /G1 of both chips is connected to the /LOADOUT control signal. But as with the IN port, the OUT port number is not fully decoded, in order to avoid needing extra decoding logic. Instead, bit 0 of the port number is connected to OUT0 /G2, and bit 1 to OUT1 /G2. This means that OUT0 will actually respond to any port number where bit 0 is 0, and OUT1 to any port number where bit 1 is 0. It would even be possible to load both OUT ports simultaneously by using a port number where both bits 1 and 0 were 0, although that probably wouldn’t be useful.

Bus Drivers

The last two components on the data bus are a pair of 4-bit bus drivers, shown at the center and at the bottom-center of the schematic. These are two halves of a single ‘244 octal driver. One drives the ALU result onto the data bus, which is necessary when storing data to RAM or an OUT port. The other drives the operand value from the Fetch register onto the data bus, which is necessary for instructions that involve an immediate constant value.

More to Come

Next time I’ll post more details about the control signals, microcode, and instruction set. Until then, questions and comments are always welcome!


Read 15 comments and join the conversation 

15 Comments so far

  1. alex555 August 28th, 2013 5:24 pm

    I have recently compared 74fxx chips with other series, and they seem substantially faster. Just a consideration if you have to buy parts, as they are about the same price, too.

  2. Steve Chamberlin August 28th, 2013 5:39 pm

    Thanks for the pointer to 74F. It looks like they’re substantially faster than either 74LS or 74HC, but the power requirements are 3x 74LS and like 100x 74HC. In my case, the speed of the CPU is almost entirely determined by the speed of the ROMs rather than the 7400 parts, so switching to 74F probably wouldn’t speed things up very much but would require a lot more power.

  3. Dave September 2nd, 2013 2:04 pm

    Thanks for this great example. I’ve been looking for this for some time. It’s a great to get started for FPGA cpu’s! Once you have posted all the documentation I’ll be building one of these myself.



  4. Petr Polášek September 3rd, 2013 8:35 am

    there are also 74AC series which are very fast while maintaining much lower power consumption. The 74VHC or 74AHC could be a good choice as well.
    “Downgrading” a few PCs (probably XT) may get you some amazingly fast cache SRAMs. Just as an example, my 386 had a few 20ns chips, but 15ns or even 12ns units could be found.
    As for slow ROMs, I would think of ROM shadowing into those SRAMs.

    But I think your aim is not to make a Cray but something dead-simple, lightweight and low-power if I’m right. 🙂

    Bye, Peter.

  5. Steve Chamberlin September 3rd, 2013 10:25 am

    RAM shadowing would be a good idea to improve the clock speed substantially. But you’re right, my goal here is to make something dead simple, not necessarily fast.

    I do feel myself being pulled in two directions:

    – Make a super-simple example of a 7400 CPU doing interesting stuff
    – Make the most powerful 7400 CPU possible

    For the second goal, I’ve been daydreaming about running a sort of 7400 CPU design competition. Within a max board area, and using only currently-available through-hole 7400 logic, RAM, and ROM, build the most awesome CPU possible. Putting limits on the board area and type of parts would force interesting trade-offs, and in my opinion would be more fun than an “anything goes” design. I really enjoy those kinds of challenges, and if I could convince other people to join and create their own designs, even better.

  6. Petr Polášek September 3rd, 2013 1:07 pm

    That CPU competition could be great. But I have a few questions.

    1)Why only through-hole?
    This rules out some modern families.

    2)Is max area size the best measure?
    Someone could use wire-wrap and thus save a lot of space (same goes for SMT). I think of different measures like overall cost, chip count etc.

    3)How to compare power? MIPS? FLOPS?
    Someone may build a DSP-like or calculator-like CPU with large lookup tables while someone may build universal CPU. Their performance will vary greatly depending on how you test them.

    4)What is be considered currently-available? In my country it is possible to go to a small electronic shop and buy 7400, 5400 and 8400 series chips (standard, S, LS) from 1960s-1990s. And radioamateurs are selling them as well. On the other hand, 74G series is something unknown for us and 74AC, 74VHC or 74AHC are rare to find.
    What about 74xC4000?

    I have thought of building a Brainf**k CPU with pbrain and possibly braintwist extension and a few custom instructions (for a total of 15 instructions and NOP). But this year I’m not gonna have much time. Sigh…

  7. Hans Franke September 3rd, 2013 3:04 pm

    I like the idea to build the most minimalistic design – after all, it’s been the start point to get me interested here. At the same time, my desire is not to get the least possible chip count, but also a usable a design for real applications.

    Imposing certain restrictions as part of such a competition could make it possible to reach for both, simplicity and power.

    As Petr aready pointed out, it will be hard to compare when the race track isn’t layed out well and in a way everyone will run the same direction. So size and chip seceltion is one part, another will be the type of CPU with certain parameters, and last but not least, the rules tu judge the winner (again Petrs posting). According to this 3 groups my suggestion would be as follows:

    0) Preface (What is it about)
    Creating a CPU with a technology (See A), certain basic structure and interface to the ‘outher’ World (See B), beeing able to run a number of test applications given (See C) as part of the judgeing (See D).

    A) Technology (Hardware constrains)
    A.1) Board Size
    1 Euro Card size (160×100).

    To me half a Euro Card should be fine, depending on your skills, this will host up to 32 DIL chips (TTL like). One argument for more space could be justified by the missing availability of some chips – for example a classic microprogram PROM would have been a 16 Pin 256×4 300 mil DIL, 4 would be enough for the Nibbler – just it’s geting hard to get them (not to speek of owning the right programmer). So people might replace them by two 27Cxx type 600 mil chips with almost double the space needed – same goes for RAM, on the old days, a 2114 (18 Pin 300 mil) would give a whooping K of nibbles, nowadays it’ll be hard to get a new one, so at least some 6116 (24 pin 600 mil) is a valid replacement.

    (As for myself I have whole pile of 80s Chips available 🙂

    A.2) Chiptypes
    Maximum SSI/MSI Chips – since this is hard to guarantee today, the rule shoulc be “only thru hole packages, available at Digikey”.

    This project has some retro touch, and thus a restriction to thru hole chips would make the wanted apeal come thru. Also, newer chips tend to have a huge transitor count, offering more funcionallity and so making it harder to compare. Having a 74xxxx designation allone isn’t good as a restriction. Also, it’s hard to define “today availible”, so selecting a single large distributor as benchmark seams to me the only possible way to go (it doesn’t mean you have to buy there)

    B) CPU Structure

    4 Bit (Data Size) CPU with at least 256 Nibble of RAM and 256 instruction words, based arround a ‘181 ALU with a I/O Interface able to address 16 I/O Devices, either bidirectional or 16 In adn 16 Out.

    I think this outlines the Nibbler without making it so specific that only certain solution is possible. The seperation of I/O is important for section C to allow common test software to be written. With this definition even a logical 32 Bit CPU is possible – with some rather huge microprogramm :))

    C) Test Applications
    A number of tasks is to be defined to compare capabilities and performance in verious ways, such as (not complete)
    – Control Task (like micro wave user interface)
    – Interactive Task (4-function Pocket Calculator, Nuber Guessing, Pong, Shooter, Card Game)
    – I/O intense task (Sound output or ‘video’)
    – Performance Task (Square Root, Trigonometric Functions, etc.)
    Any more?

    All applications have to be described in a complete but neutral way. Their design should take into account that the target machine is only 4 Bit wide and rather small 🙂

    D) Judgeing
    This is a foggy area as of now. Criteria I see are:

    – Chip Count (less is better)
    – Chip Complexity (lower is better)
    – Ability to perform the Test Programms at all
    – Programm Size (smaler is better)
    – Absolute Performance (how fast the performance task is executed at maximum possible machine speed)
    – relative Performance (how fast the performance task is executed at 1 MHz system clock – tbd, mybe ALU clock)

    last but not least:

    – Beauty of Design

    We all want some cool, awesome CPU, right, so we need to judge it. Problem is that this is also a gray area of personal preference. So I would suggest that everyone who enters the competition will be asked to judge every other design on a 1..10 scale – the Beauty judgement beeing the average of all votes.

    Any comments?

    Exiting idea.

  8. Steve Chamberlin September 3rd, 2013 4:46 pm

    I’m pretty sure you guys are crazy, but in a good way. I didn’t really expect anybody to take my suggestion seriously. 🙂

    Setting constraints is just a way to make things interesting. Maximum simplicity (Nibbler) is one possible constraint, a limit on board area or chip count is another.

    I mentioned through-hole because it’s already limited to 7400 parts. I’m more interested in design simplifications, not in minimizing area by using SMD parts with tiny pins. If the constraint were number of chips instead of board area, then I guess SMD parts would be fine.

    I wasn’t thinking there would be any objective measure of “most powerful”, since it’s so hard to directly compare different architectures. Maybe you could use some standard benchmarks. But more likely the “winner” (if there was one at all) would be chosen by judging, like a baking competition at the county fair.

    I made some notes on this a few days ago – they are similar to what Hans suggested:

    Allowable Parts:
    7400 series, 7805, RAM (SRAM or DRAM), ROM (PROM, EPROM, EEPROM, Flash)
    passives, connectors, headers, switches, buttons, LEDs, can oscillators, crystals, power jacks
    NO: programmable parts, GALs, CPLDs, microcontrollers, etc
    All parts must be in stock at one of: Digikey, Mouser, Newark/Farnell, Jameco, Futurlec
    All parts must be through-hole
    Single or multi-board, max total board area 20 sq inches
    Parts can go on both sides of the board if desired

    Expansion Board:
    Max total board area 6 sq inches
    Can use any parts you want, not limited to 7400 series
    Can get parts from any source you want
    All parts must be through-hole
    Intended for I/O stuff like a display screen, clock, audio, video, network, motor, sensor
    NOT: another CPU, MCU, or logic that extends the CPU core

    Compete For:
    highest clock speed
    highest instructions/second
    highest benchmark score (what benchmark?)
    Most capable/coolest (subjective – judged)

  9. Hans Franke September 3rd, 2013 5:57 pm

    I think I can go with your suggestions – maybe some clarification:

    What is 6 or 20 square inches? It seams as if there is an international interest here, so a more widly used measurement system would be a good idea.

    For supliers, I would restrict the part check to two or maximum three aith a large selection browsable via the web. Too many will make any judgement hard – alsso I would in this case call to include at least two well known distributors in Germans – and one of them can provide almost any 74xx ever build :))

    Since expansion boards are banned to have any logic extension to the core CPU (other than using defined I/O Interfaces), I would not only allow not only any kind of chip outfit (not just thur hole), but also any kind of chip. Including CPU/MCUs. After all, banning them woudl also ban LCD panels and similar systems.

    Not sure if highest clock is a real measurement. Adding a 100 MHz clock and dividing it by 16 will do the trick 🙂

    For the complexity vs. performance, we could maybe use a measurement based on chip count and performance (as average from the benchmarks). If we take the average chip count from all projects (just core board) and plot it against the performance, we should see a nice distribution. By normalizeing and multiplying the numbers a single score for over all ‘best’ will be reached.

  10. Petr Polášek September 7th, 2013 12:45 am

    I think that clocking with 1GHz or more and dividing with U664B/U983B/SAB6456 (thus getting a clock of less than 20MHz) should get a prize for the most insane and useless design.

  11. Ale September 9th, 2013 9:56 pm

    If you are looking for a powerful 7400-based computer the PDP-11/45 is a very good example. It even has MMU and floating point ! and un*x 🙂
    A very nice and simple design you have there. It could maybe, just maybe simplified a bit using a serial bus… is a possibility, the registers are then shift-registers. Maybe for wider buses.
    Thanks for sharing, great stuff 🙂

  12. Peter Hizalev December 16th, 2015 8:00 am

    Hi Steve. I am curious how RAM reads work: Since CSRAM is ORed with CLK, RAM will drive data bus up until CLK is raising. And 137 is being written on raising CLK and requires some hold time for data bus, which won’t be present. How does this work?

  13. Steve Chamberlin December 16th, 2015 8:36 am

    It’s not actually ORed – the CLK is connected directly to the RAM’s CE input. This ensures that the RAM is only enabled during the second half of each clock cycle, which prevents spurious writes to RAM during the first half, when the control signals are still changing state and might not yet have the correct values. You raise a good point about the hold time on the ‘173. The 74HCT173 has a negative hold time of -4 ns at 5V, so in this case it’s not a problem: Even with a small positive hold time, it would probably still be OK, because the RAM outputs won’t change instantly at the clock edge. But relying on that is probably a risky design choice.

  14. Peter Hizalev December 16th, 2015 9:02 am

    I am experimenting with 74LS173, which has +3 ns hold time. Once in a while I get bad RAM reads–probably this is it.

  15. Peter Hizalev December 16th, 2015 9:10 am

    Just looked at CY7C168A datasheet and “CE HIGH to high Z” has no min and max of 8 ns, so I understand it has a chance of going it high impedance before 74LS173 hold.

Leave a reply. For customer support issues, use the Contact page instead of comments.