Archive for the 'Nibbler' Category
After fiddling with Nibbler’s hardware and fixing its glitches, it’s time to write some demo software. Let’s see what this little handmade 4-bit CPU can do! Sorry the quality on these videos isn’t the best – try changing the quality setting to 480P to get a bit more picture detail.
Gamers of a certain age will doubtless remember Mastermind, a code-breaking game for two players, which is based on an old pencil and paper game called Bulls and Cows. The codemaker chooses a secret code that’s four elements in length, where each element can be one of six possible colors. The codebreaker then has ten chances to guess the secret code. After each guess, the codemaker gives feedback in the form of black or white pegs: a black peg means some element is the right color and in the right position, and a white peg means some element is the right color but in the wrong position. The feedback pegs are position-independent, so a black peg doesn’t tell the codebreaker which of the four elements in the guess was correct.
Adapting Mastermind to Nibbler was relatively simple. A few weeks ago I wrote a Guess the Number program for testing purposes, and most of it was reusable for Mastermind. The final program was 2057 bytes, or just over half of Nibbler’s available program memory. Instead of using colored pegs, the code is a 4-digit number, where each digit is between 0 and 5. I added a few little bells and whistles, like button feedback sounds, and a little victory tune when you guess the secret code correctly. The result is surprisingly fun, if you like these kinds of logic puzzles. It works fine in the Nibbler simulator too, if you want to try it out.
I wanted to focus on music and audio next. Nibbler doesn’t have any real audio hardware, only a speaker that’s directly connected to a digital output pin. Making sounds is as simple as toggling the output quickly between 0 and 1, but making specific sounds at just the right frequency is more complicated. It requires a lot of cycle counting math, to guarantee the speaker will be toggled at exactly the right rate for an A at 440 Hz or a piano’s middle C (261.6 Hz). Each period of the waveform needs to be exactly the same length, even as the path through the inner code loop varies in length due to carry propagation with multi-nibble counters. If one period is slightly off, your ear will hear it.
Setting the duration of each note involves more math, dividing the duration by the period length to find the number of periods to play. That means two notes at different frequencies but with the same duration will have different repeat counts (duration values) in the code, further adding to the necessary bookkeeping.
I started writing a music demo by hand, but it was such a pain that I couldn’t imagine building a whole song that way. Instead, I created a new tool called Music Maker to do the math and code creation for me. It takes song data as input, and generates a Nibbler assembly program as output. The song data is expressed in the Music Macro Language from Microsoft Basic, which you might recall if you ever used GWBASIC’s PLAY command. This simple one-voice text-based format is a perfect match for Nibbler’s limited audio capabilities. Songs are described as a series of notes, with optional length and octave modifiers:
"T180 DF#A L2 A L4 O4 AA P4 F#F# P4 O3 D"
Music Maker saved me a huge amount of effort, but I still needed to hand-edit the generated assembly code to repeat a few musical phrases in the right spots. Because Nibbler’s instruction set isn’t well-suited to storing constant data in programs, the code is fairly bloated, taking 3569 bytes (87% of memory) for a song that’s a few dozen measures long. I chose the Blue Danube for the demo song.
For the last demo I wanted to do something really different, so I created a real-time action game using custom graphics. The classic arcade game Frogger leapt to mind. The player must guide a frog across many lanes of traffic and a treacherous river, avoiding a variety of obstacles moving in different directions and at different speeds.
Bringing Frogger to Nibbler presented several challenges. First – graphics. The HD44780 chip inside the 16×2 character LCD screen does support user-defined character fonts, but only for eight characters, which doesn’t provide a lot of variety. Second – screen size. The screen only has two rows, and that’s not many lanes of traffic for our frog to dodge. How can you make a game with that?
Each character on the LCD screen is 5×8 pixels, so the total screen height is 16 pixels. My approach was to divide the screen into four virtual rows, each of which was 4 pixels tall, with two virtual rows per actual row of text characters. Then I created custom character fonts for each possible combination of contents in the cells in the upper and lower virtual rows. Each cell can contain one of three possible items (an obstacle, the frog, or nothing), and there are two cells, so there would seem to be 3×3 or 9 permutations. But because there’s only one frog in the game, the permutation with frogs in both cells isn’t needed, and the remaining eight permutations fit exactly into the HD44780′s eight user-defined character slots. The result is a 16×4 virtual playfield, where each playfield cell is 5×4 pixels. For a bit of added variety, I also made the upper and lower obstacles look visually distinct.
To make the game “real-time”, I needed a way to animate the obstacles while the player was moving. Without interrupts or any other real time-keeping mechanism, I had to add a timeout counter inside the busy loop that checks for button input. After a few thousand checks of the button state, the game jumps to a routine that moves the lanes of obstacles by one position, then returns to button checking. Fortunately it all happens fast enough that you can’t notice any hiccups.
Frogger was the only demo where the lack of an indirect addressing mode really hurt. Tasks like animating the playfield or checking for collisions are just screaming out for indirect addressing. Without it, I had to write code that explicitly copies/checks/moves each of the 64 playfield cells. The result is lots of ugly bloat, and a code size of 3279 bytes (80% of memory).
The final game turned out nicely! The only big flaw is the animation: the persistency of the LCD screen makes objects appear to fade and flicker, but there’s nothing I can do about that. Despite this, it’s a lot of fun to play. In the video, notice that the lanes of obstacles animate at different speeds, and each lane moves in the opposite direction from its neighbors, making the game more challenging than it first appears. If you can navigate the frog upwards from the bottom row, and escape out the top row, you’ll hear a little victory tune. Go Nibbler!Be the first to comment!
What happens when you feed an invalid voltage level to the input of a digital logic gate? It’s an interesting story. In my last post, I described a subtle Nibbler bug where a combinatorial feedback loop through the ALU caused an invalid voltage level to propagate from an output back to the input, creating a self-sustaining cycle of badness. I fixed the problem by adding another chip to break the loop, but I wasn’t totally happy with that solution. After studying the problem further, and peering into the internal details of the 74LS181 ALU chip, I’ve found a new solution that doesn’t require any extra hardware. The new chip has been removed, and Nibbler still runs flawlessly.
To explain what’s going on, it’s important to understand how basic logic gates like NOR and AND are built from transistors. The diagrams below show gates using FET transistors, instead of the bipolar transistors used in the 7400LS family, but the concept is the same.
In the NOR gate, if either A or B is above the transistor switching threshold, one or both of the bottom transistors will be turned on. This pulls the output to ground. The top transistors will be turned off. If neither A nor B is above the switching threshold, neither of the bottom transistors will be turned on, but both of the top transistors will be on, pulling the output to Vcc.
In the AND gate, the top and bottom portions are reversed, and an inverter is placed at the output of the first stage to create an AND instead of a NAND.
Now let’s assume B is at an invalid voltage level, right around the transistor switching threshold, while A is zero volts. In the NOR gate, one of the bottom transistors will be off, and the other will midway between off and on, acting similar to a resistor. One of the top transistors will be fully on, and the other will be midway between off and on. The result is that both the top and bottom sections will be partly on, creating a path from Vcc to ground, and resulting in an output voltage that’s somewhere in the middle.
If B was invalid while A was at Vcc, then one of the bottom transistors would be fully on, and it wouldn’t matter that the other bottom transistor was midway between off and on. Similarly, one of the top transistors would be fully off, so the other transistor wouldn’t matter. The output would be 0 volts regardless of the voltage at B.
Applying the same reasoning to the AND gate, these rules emerge:
- ? NOR 0 = ?
- ? NOR 1 = 0
- ? AND 0 = 0
- ? AND 1 = ?
Now let’s apply these rules to a section of the 74LS181 internal logic, taken from its datasheet. In the case where Nibbler was failing, the ALU function input S was all 1′s, the A input was also all 1′s, and the B input was momentarily at an invalid voltage level. The blue numbers show the propagation of values through the first input stage of the ALU.
In this case where S is all 1′s, the ALU output is supposed to be A, and the B input shouldn’t matter. And logically, that’s what happens. The three gates on the right compute the expression A0*/B0*S2 NOR A0*B0*S3, which here reduces to /B0 NOR B0, which is always 0 regardless of the value of B0. So B is irrelevant. But electrically it’s a different story. The ? values propagate through the gates. ? NOR ? is not 0. The ALU outputs ? values to the data bus, which eventually propagate back to its B input, continuing the vicious cycle. The circuit doesn’t work.
But wait! On the ’181 ALU, there are actually two different ways to output the value of A. I arbitrarily chose one of them when I designed the microcode. The other way sets the ALU function input S to all 0′s instead of all 1′s. What happens in that case?
The ? values are stopped dead at the first input stage, and don’t propagate any further into the ALU. The circuit works.
Applying this to Nibbler was easy. I just changed the microcode to use the 0000 output function instead of the 1111 function, and removed the extra chip that I previously added to break the feedback loop. Happy times!Read 1 comment and join the conversation
Aha! After four days of tinkering with Nibbler, I finally found the cause of the occasional bad writes to RAM. At first I thought it was a timing problem with the RAM enable signals, then I thought it was bus contention, but the key piece of evidence was the logic analyzer trace you see above. The X cursor marks the start of a clock cycle. The CPU is attempting to take the number 2 in the accumulator, pass it through the ALU, and write it to RAM. Shortly after the start of the clock cycle, you can see that the ALU function inputs glitch briefly. After that, the ALU outputs all adopt the same values as the accumulator, except for ALU1, which demonstrates some crazy noise. This only happens rarely – maybe one in ten thousand writes to RAM – but when it happens the wrong value gets stored.
What could cause that horrible-looking signal on ALU1? The ALU is just passing through the value of A, and A looks fine, as do the ALU function inputs S, M, and Cin. The sneaky answer is that the problem is caused by the ALU’s B input, which isn’t even being used during this operation.
The diagram on the left shows the problem. When the ALU bus driver is enabled, the ALU result value is driven onto the data bus, where it makes its way back to the ALU’s B input. I thought this was OK, as long as the ALU function was set to something that only used the A input, and was independent of B. From a logical standpoint, that’s true, but from an electrical standpoint it’s not. Even though the value at the B input is logically irrelevant, if an invalid voltage around 2.5v appears at the B input, it will result in an invalid voltage at the ALU output. The bus driver has the same logic thresholds, so it also sees an invalid input voltage and produces an invalid output voltage, which appears back at the ALU’s B input, completing the feedback cycle. Garbage in, garbage out.
This should be a rare occurrence, and it is. Any little noise or voltage drift that pushes the bus to a valid 0 or 1 voltage will break the cycle. My suspicion is that in some circumstances, the internal structure of the ALU (a 74LS181) is such that a negative feedback loop is created on one of the bus lines. If the bus line voltage drifts up by epsilon, the ALU will output a voltage that’s lower by epsilon, which will be reflected at the bus driver output, counteracting the drift. It would be similar to connecting the output of an inverter to its input.
My solution is shown in the diagram on the right. A 74HCT157 two-input multiplexer was added to the ALU’s B input. Normally it passes the data bus value through to the B input, but when the ALU drives its result onto the bus, the mux passes zero to the B input instead. It doesn’t really matter what value is passed to the B input, as long as it’s something valid.
At first I was reluctant to call this “the cause of the problem”, because I’ve been through so many other apparent solutions in the past few days. At one point I thought that adding capacitors to the data bus fixed the problem, then replacing the bus driver HCT chip with an LS chip, or the fetch register. But none of those solutions actually explained why things didn’t work originally, nor why they fixed the problem. And after more careful testing, replacing the bus driver or fetch register with LS-family chips didn’t actually fix the problem 100% of the time. The combinatorial loop is the only scenario that explains why things weren’t working originally, and that works 100% reliably in all the tests I’ve made after adding the mux.
I’m happy to have finally found the answer to this mystery, but a little unhappy with the form the solution takes. Looking at the revised architecture diagram, it’s not at all obvious to the casual observer why there should be a mux there. The fact that it’s required for electrical reasons and not logical reasons is even worse. It just doesn’t feel “clean”, in some hard to define way. Perhaps there’s a better solution, but at this point I’ve spent so much time trying to fix hardware this problem, I just want to move on to writing more fun Nibbler software now.
Read 18 comments and join the conversation
Yesterday’s Nibbler celebration was premature – I discovered that about one in ten-thousand writes to RAM stores the wrong value. Bad RAM, bad! No biscuit for you!
Arghh, what a headache. I first discovered the problem while making improvements to the guess-the-number game. After many experiments, I was able to boil it down to a case where $F is written to RAM, but something else is read back. And I proved that it’s the write operation that’s going bad, not the following read. But that’s about as far as I’ve gotten in understanding why it fails, or how to fix it.
; Example 1 - fails consistently #define TEST_LOCATION $038 testram: lit #0 addi #15 st TEST_LOCATION ld TEST_LOCATION cmpi #15 jz testram fail: ; turn on the debug LED
Example 1 adds 15 to 0, stores the result, reads it back, and checks to make sure it’s 15. If not, it turns on the debug LED to indicate a failure. This test fails consistently, after anywhere from zero to 10 seconds of operation.
; Example 2 - works reliably #define TEST_LOCATION $038 testram: lit #15 addi #0 st TEST_LOCATION ld TEST_LOCATION cmpi #15 jz testram fail: ; turn on the debug LED
Example 2 is identical to example 1, but it adds 0 to 15 instead of 15 to 0. This test works reliably. How can that be? After all, in both examples the entire CPU is in the exact same state at the time of the store instruction. Wait, maybe it’s not the store that’s going wrong at all! Maybe the addi is faulty and computing the wrong sum?
; Example 3 - works reliably #define TEST_LOCATION $038 testram: lit #0 addi #15 cmpi #15 jz testram fail: ; turn on the debug LED
Nope. Removing the store/load from example 1, which failed consistently, now works reliably. Head, meet wall. Bang, bang, bang.
Click on the simplified schematic at the top of this post to see what’s involved while the accumulator value is being written to RAM.
My best guess is that at the end of a RAM write, either the data or the address are changing before the RAM /CS is de-asserted. /CS comes from a 74LS32 OR gate with a max propagation delay of 22 ns. The clock is one of the OR inputs, so /CS will be de-asserted no more than 22 ns after the rising edge of the clock. Could the RAM data or address be changing during this window?
Address: The high address bits come from the Fetch register, whose value never changes at the same time as a RAM write. The low address bits come from the program ROM, which has a 150 ns propagation delay, on top of the PC regsiter’s 39 ns tcq delay, so it seems very unlikely those values could change within 22 ns of a clock edge.
Data: The data is a little more complicated. The value coming from the accumulator won’t change, but the ALU function might, or bus driver B might become disabled, or something else might start driving the data bus and cause contention. All of those would require changing control signals in order to happen. Control signals come from the microcode ROMs, which have a 150 ns propagation delay, on top of tcq delays of about 30 ns for the registers at their inputs. So it seems unlikely the data values could be changing within 22 ns of a clock edge either.
A few other things I tried:
- Replaced the 74HCT244 bus driver with a 74LS244 – This helped a lot, but didn’t completely eliminate the problem.
- Changed TEST_LOCATION to $000 – The test still failed intermittently, but not as much as before
- Changed TEST_LOCATION to $FFF – The test passed reliably
I’m not even 100% certain that the problem is with address or data becoming invalid before the /CS de-assert. Maybe address isn’t valid before the /CS assert, or maybe there’s a glitch on /CS or /WE at some other time. But I don’t think so.
What I really need to do is hook up an oscilloscope or logic analyzer, and look at the relative timing of the clock, /CS, data, and address to see what’s going on. Unfortunately I only have one working scope probe, and even if I had more, I’m not sure the scope has enough resolution to see a timing error of a few nanoseconds. And even if I can demonstrate that data or address are changing before /CS de-assert, I’m not sure what I could do to fix it without major changes. Hmmmm…Read 27 comments and join the conversation
YES, IT WORKS! The Nibbler homemade CPU is up and running, and it can play “guess the number” like nobody’s business. Awesome! If you haven’t seen my earlier posts, Nibbler is a 4 bit CPU built from standard 7400 series logic chips – individual counters, registers, buffers, and gates. It’s an educational example of a simple CPU that’s easy to understand and build, but still capable of running games and other interesting programs. The CPU contains 17 chips in total.
With all the advance design work and simulation that went into Nibbler, I was fairly confident it would work, but it’s sure nice to see it confirmed in real hardware. Construction was slow and tedious, but went fairly smoothly. After soldering the buttons and passive components, the chips were mounted in sockets and connected by wire-wrap. The whole construction process took about five days.
- 4 bit custom-made CPU
- 12 bit addressing, 4K address space size
- Harvard architecture – separate program and data memory spaces
- 2 MHz clock speed, 2 clocks per instruction = 1 million instructions/sec
- Thirteen 7400-series chips in the CPU data and control paths
- Two 28C16 EEPROMs store microcode for generating internal control signals
- 4K x 4 SRAM (CY7C168A)
- 28C64 EEPROM stores programs
- Four pushbutton inputs
- 16 x 2 character LCD
- Audio speaker
- About 100 mA current draw at 5V
For lots more detail on how Nibbler works, see the project home page.
Fun with Debugging
During construction, I ran into a couple of errors in my schematic. I accidentally reversed the Zero and Carry inputs to the microcode ROMs. You’d think that would be an easy thing to catch – notice that all the instructions involving the Carry flag didn’t work, double-check the wiring, and aha! Unfortunately it didn’t happen that way at all.
At the time I ran into this problem, I was part-way through construction, and I hadn’t yet connected the Flags register’s Zero input to anything. It was just floating. Because of my schematic mistake, the CPU was actually using this undefined Zero flag when it thought it was using the Carry flag. The crazy thing was that it still worked sometimes! After powering on the CPU, the Carry flag appeared to work normally and programs ran fine for about two seconds, after which the whole thing went to hell. Because it worked briefly, I assumed there was some kind of overheating problem, or maybe a timing problem caused by changing propagation delays as the chips warmed up. I spent almost a whole day trying to track down the problem. At least it gave me an excuse to use my logic analyzer!
How is it possible that the CPU worked for a few seconds, when it was using the completely wrong flag? It turns out that because Zero wasn’t yet connected to anything, and the Zero and Carry wires ran side-by-side, capacitive coupling caused the floating Zero input to follow the value of the Carry flag, at least for a little while. What?! Yup, the combination of reversed wires and an unconnected input caused the right value to jump to the wrong wire. After a few seconds, some stray capacitance somewhere charged up to the point where this little trick no longer worked, and the CPU conked out. As soon as I swapped the wires and connected the Zero input properly, the CPU began working nicely.
Son of BMOW 1
It’s interesting to compare Nibbler with BMOW 1, my first homemade CPU. Compared with BMOW 1, Nibbler is tiny! The whole thing fits in the palm of my hand. BMOW 1 looks like a giant in comparison. And Nibbler has just 17 chips, all of which are either memory or 7400 series logic. BMOW 1 has 65 chips, many of which are GALs (programmable logic). The size discrepancy would be even greater if BMOW 1 were constrained to use only 7400 series logic.
How can Nibbler be that much smaller? The major difference is that BMOW 1 supports lots of I/O devices that Nibbler doesn’t, like a PS/2 keyboard, VGA video, a serial port, 3-voice audio, and a real-time clock. All those device support chips really add up. BMOW 1 also has wider data and address paths, and has a hardware stack register and some other CPU features that Nibbler lacks. BMOW 1 is a very cool machine, but it carries a lot of bloat, making it difficult for people to understand how it all works. Nibbler is more like a minimal CPU example, with just enough power for a few basic games and demos.
Because this is the “Big Mess o’ Wires” blog, I’ll conclude with some gratuitous wiring photos. Enjoy!
Read 7 comments and join the conversation
Nibbler construction is underway! I’m building the CPU on a 5 x 4 inch perfboard, and connecting the components using wire wrap. This particular perfboard is a Twin Industries 8200 Series, which has continuous metal planes on both sides that can be used for power and ground. Because the holes don’t go all the way to the edges of the board, the usable area is just 4.5 x 3.5 inches (114 x 89 mm) – about the same size as my mobile phone. It’s very compact. There’s enough extra space for two more 0.3 inch chips, if I later decide to add to the system.
Unfortunately construction didn’t start smoothly. The board didn’t have any mounting holes, and you can see in the photo what a lousy job I did trying to drill four evenly-spaced holes in the corners. After drilling, I made legs by putting screws through the top into 1 inch aluminum standoffs underneath. I quickly discovered that top-side screws in bottom-side standoffs formed an electrical connection, shorting the ground and power planes! Ugh. Inserting nylon washers between the standoffs and the bottom of the board solved that problem, but it wasn’t the last of the challenges.
These kinds of boards are great for mounting and connecting DIP chips, but not so great for anything else like discrete components or buttons. Anything that doesn’t have 0.1 inch spaced pins requires bending or drilling, and discretes also require soldering instead of wire wrap. It took me several hours to drill holes for the DC power jack, deform the button pins to a 0.1 inch spacing, and mount and solder all the other passive components. In many cases that involved cutting a wire so it could be soldered at one end, but wrapped at the other end. What a pain in the butt.
Soldering anything to this board is surprisingly hard. Each hole is separated from the plane that surrounds it by a circular ring of soldermask. In order to connect a discrete component’s pin to the power or ground planes, the solder must bridge across this ring, and that’s something the solder definitely doesn’t like to do. More and more solder must be added, until it suddenly flows across the ring *and* into the neighboring hole, creating an accidental short. Grrrr! With practice I was able to avoid this, but each solder point was a trial of patience.
After placing all the discrete components, I did a trial fit of all the wire wrap sockets, mounting all the chips inside them. It’s a good thing I did, because there were clearance problems all over the place! If I’d wired up half the system before discovering the clearance issues, I would have been in big trouble. The worst problem was the 74HCT74 for Phase and Reset. Although it’s a 14-pin chip, its package is so elongated, it actually needs more room than other 16-pin chips! I also found a few vertical clearance problems. The LCD is supposed to sit on top of several chips, but I discovered there wasn’t as much space as I’d expected available under the LCD. I’m going to have to build some kind of vertical extension for the LCD, in order to raise it high enough to clear the chips under it.
The last big hurdle was the ZIF socket for the program ROM. I know from past experience that I’ll be swapping that ROM in and out frequently, and I don’t want to have to use a chip puller each time. A ZIF socket will solve the problem, but wire wrap ZIF sockets don’t exist, or are so rare that it amounts to the same thing. I ended up soldering a standard ZIF socket to two rows of machine pin male header, and then inserting the header into a wire wrap socket. It’s not pretty, and the socket sandwich is more than an inch tall, but it works!Be the first to comment!