Aha! After four days of tinkering with Nibbler, I finally found the cause of the occasional bad writes to RAM. At first I thought it was a timing problem with the RAM enable signals, then I thought it was bus contention, but the key piece of evidence was the logic analyzer trace you see above. The X cursor marks the start of a clock cycle. The CPU is attempting to take the number 2 in the accumulator, pass it through the ALU, and write it to RAM. Shortly after the start of the clock cycle, you can see that the ALU function inputs glitch briefly. After that, the ALU outputs all adopt the same values as the accumulator, except for ALU1, which demonstrates some crazy noise. This only happens rarely – maybe one in ten thousand writes to RAM – but when it happens the wrong value gets stored.
What could cause that horrible-looking signal on ALU1? The ALU is just passing through the value of A, and A looks fine, as do the ALU function inputs S, M, and Cin. The sneaky answer is that the problem is caused by the ALU’s B input, which isn’t even being used during this operation.
The diagram on the left shows the problem. When the ALU bus driver is enabled, the ALU result value is driven onto the data bus, where it makes its way back to the ALU’s B input. I thought this was OK, as long as the ALU function was set to something that only used the A input, and was independent of B. From a logical standpoint, that’s true, but from an electrical standpoint it’s not. Even though the value at the B input is logically irrelevant, if an invalid voltage around 2.5v appears at the B input, it will result in an invalid voltage at the ALU output. The bus driver has the same logic thresholds, so it also sees an invalid input voltage and produces an invalid output voltage, which appears back at the ALU’s B input, completing the feedback cycle. Garbage in, garbage out.
This should be a rare occurrence, and it is. Any little noise or voltage drift that pushes the bus to a valid 0 or 1 voltage will break the cycle. My suspicion is that in some circumstances, the internal structure of the ALU (a 74LS181) is such that a negative feedback loop is created on one of the bus lines. If the bus line voltage drifts up by epsilon, the ALU will output a voltage that’s lower by epsilon, which will be reflected at the bus driver output, counteracting the drift. It would be similar to connecting the output of an inverter to its input.
My solution is shown in the diagram on the right. A 74HCT157 two-input multiplexer was added to the ALU’s B input. Normally it passes the data bus value through to the B input, but when the ALU drives its result onto the bus, the mux passes zero to the B input instead. It doesn’t really matter what value is passed to the B input, as long as it’s something valid.
At first I was reluctant to call this “the cause of the problem”, because I’ve been through so many other apparent solutions in the past few days. At one point I thought that adding capacitors to the data bus fixed the problem, then replacing the bus driver HCT chip with an LS chip, or the fetch register. But none of those solutions actually explained why things didn’t work originally, nor why they fixed the problem. And after more careful testing, replacing the bus driver or fetch register with LS-family chips didn’t actually fix the problem 100% of the time. The combinatorial loop is the only scenario that explains why things weren’t working originally, and that works 100% reliably in all the tests I’ve made after adding the mux.
I’m happy to have finally found the answer to this mystery, but a little unhappy with the form the solution takes. Looking at the revised architecture diagram, it’s not at all obvious to the casual observer why there should be a mux there. The fact that it’s required for electrical reasons and not logical reasons is even worse. It just doesn’t feel “clean”, in some hard to define way. Perhaps there’s a better solution, but at this point I’ve spent so much time trying to fix hardware this problem, I just want to move on to writing more fun Nibbler software now.
Read 18 comments and join the conversation