BMOW title
Floppy Emu banner
Border

Sneaky Combinatorial Feedback Bugs

Aha! After four days of tinkering with Nibbler, I finally found the cause of the occasional bad writes to RAM. At first I thought it was a timing problem with the RAM enable signals, then I thought it was bus contention, but the key piece of evidence was the logic analyzer trace you see above. The X cursor marks the start of a clock cycle. The CPU is attempting to take the number 2 in the accumulator, pass it through the ALU, and write it to RAM. Shortly after the start of the clock cycle, you can see that the ALU function inputs glitch briefly. After that, the ALU outputs all adopt the same values as the accumulator, except for ALU1, which demonstrates some crazy noise. This only happens rarely – maybe one in ten thousand writes to RAM – but when it happens the wrong value gets stored.

What could cause that horrible-looking signal on ALU1? The ALU is just passing through the value of A, and A looks fine, as do the ALU function inputs S, M, and Cin. The sneaky answer is that the problem is caused by the ALU’s B input, which isn’t even being used during this operation.

The diagram on the left shows the problem. When the ALU bus driver is enabled, the ALU result value is driven onto the data bus, where it makes its way back to the ALU’s B input. I thought this was OK, as long as the ALU function was set to something that only used the A input, and was independent of B. From a logical standpoint, that’s true, but from an electrical standpoint it’s not. Even though the value at the B input is logically irrelevant, if an invalid voltage around 2.5v appears at the B input, it will result in an invalid voltage at the ALU output. The bus driver has the same logic thresholds, so it also sees an invalid input voltage and produces an invalid output voltage, which appears back at the ALU’s B input, completing the feedback cycle. Garbage in, garbage out.

This should be a rare occurrence, and it is. Any little noise or voltage drift that pushes the bus to a valid 0 or 1 voltage will break the cycle. My suspicion is that in some circumstances, the internal structure of the ALU (a 74LS181) is such that a negative feedback loop is created on one of the bus lines. If the bus line voltage drifts up by epsilon, the ALU will output a voltage that’s lower by epsilon, which will be reflected at the bus driver output, counteracting the drift. It would be similar to connecting the output of an inverter to its input.

My solution is shown in the diagram on the right. A 74HCT157 two-input multiplexer was added to the ALU’s B input. Normally it passes the data bus value through to the B input, but when the ALU drives its result onto the bus, the mux passes zero to the B input instead. It doesn’t really matter what value is passed to the B input, as long as it’s something valid.

At first I was reluctant to call this “the cause of the problem”, because I’ve been through so many other apparent solutions in the past few days. At one point I thought that adding capacitors to the data bus fixed the problem, then replacing the bus driver HCT chip with an LS chip, or the fetch register. But none of those solutions actually explained why things didn’t work originally, nor why they fixed the problem. And after more careful testing, replacing the bus driver or fetch register with LS-family chips didn’t actually fix the problem 100% of the time. The combinatorial loop is the only scenario that explains why things weren’t working originally, and that works 100% reliably in all the tests I’ve made after adding the mux.

I’m happy to have finally found the answer to this mystery, but a little unhappy with the form the solution takes. Looking at the revised architecture diagram, it’s not at all obvious to the casual observer why there should be a mux there. The fact that it’s required for electrical reasons and not logical reasons is even worse. It just doesn’t feel “clean”, in some hard to define way. Perhaps there’s a better solution, but at this point I’ve spent so much time trying to fix hardware this problem, I just want to move on to writing more fun Nibbler software now.

 

Read 18 comments and join the conversation 

18 Comments so far

  1. BartoszP September 22nd, 2013 2:56 am

    Maybe you should look at 74*125 buffer or input switches which could make unexpected “noise” via this buffer ?
    Why ? Just one data line is noisy so it means that:
    1. there is contact/wireing problem
    2. microswitch connected to this line is “noisy”
    3. chip is “bad”.

    If it is true then *157 MUX just filters these noises.

    My another suspicion is that data bus is just overloaded. What about removing *125 chip for test ?

  2. Lennart September 22nd, 2013 9:07 am

    I have found my old TTL data book! And I take a photo of the internal shematic of the 74LS181.

    http://lell.se/bilder/kretsar/P2060099.JPG

    I’m sure it is possible to find it as pdf somewhere.

    There you can see that the A=B output really is an F=0, so your additional OR-gates are not needed.
    Otherwise happy reading.

  3. Lennart September 22nd, 2013 9:12 am

    Oups! Forget about the OR-gates not needed, it is an AND-gate inside the 74LS181 to the A=B output…

  4. Stephen September 22nd, 2013 9:40 am

    Wow, good find. Glad your logic analyzer was able to catch it when it happened.

    There’s something to be said for not messing with what works, but if the mux bothers you, you could probably achieve the same thing with a quad D flip-flop or transparent latch, depending on the timing details. A latch at the inputs to a combinatorial block is a good design pattern. If that mux is made with transmission gates, it’s probably fine, but if it is made from AND gates, I’m not sure the problem is guaranteed to be gone.

  5. alex555 September 22nd, 2013 11:51 am

    I forgot if you already tried this, but what about making sure the bus is always driven? I also remember reading that 10k is only suitable for pulling up one chip, so it might be worthwhile to try a smaller resistance.

  6. Dr Jefyll September 22nd, 2013 1:01 pm

    Wow — nice work, Steve! I confess I thought you were barking up the wrong tree with this idea that the B inputs were electrically relevant despite being logically irrelevant.

    As for “Perhaps there’s a better solution,” IIRC it’s possible to buy tristate bus driver IC’s that feature Schmitt Trigger inputs. If such a device were substituted for the ’244 you presently have in there, that should effect a solution (since Schmitt Trigger devices are generally incapable of producing indeterminate output levels). That would break the feedback loop of invalid voltages. Congrats again,

    — Jeff

  7. Steve Chamberlin September 22nd, 2013 1:34 pm

    @BartoszP, it’s actually not just one data line that had the problem – I saw it on three lines at different times, sometimes three at the same time. I did try removing the ’125 as well as a bunch of other theories: see the comments thread in my previous post on de-glitching RAM writes.

    @alex555, sorry, I forgot to mention here that I modified the microcode so that the bus is always driven by something. This was buried deep in the comments to my previous post. The microcode change seemed to help, but the problem didn’t completely go away until breaking the feedback loop. The “final solution” has the new microcode that always drives the bus, *and* the mux to break the loop.

    @Stephen, that’s a really good point that the ’157 could be subject to the same feedback problem with bad voltage levels, since it’s just another combinatorial circuit. In 15 hours of continuous testing, it never failed, though. Based on my understanding of how AND and OR gates are built from transistors, I think 0 OR bad = bad, 1 OR bad = 1, 0 AND bad = 0, 1 AND bad = bad. I believe the ’157 will AND the enable signal (0 in this case) with the data bus, and because 0 AND bad = 0, it works.

    @Dr Jefyll, the 74LS244 does have some kind of hysteresis at the inputs which the 74HCT244 lacks, but I don’t think it’s a Schmitt trigger. I tried subbing in a 74LS244 before I added the mux, and it did help some, but the problems still occurred.

  8. Steve Chamberlin September 22nd, 2013 1:47 pm

    There are two different ’181 ALU functions that will output A: M=0, S=0000, or M=1, S=1111. From looking at the ’181 internal diagram that @Lennart posted, assuming that 0 AND bad = 0 is correct, then you can see that for M=0, S=0000 the B input should not be electrically relevant. Nibbler’s microcode actually uses the M=1, S=1111 function though. I didn’t try to reason through the whole diagram to see where the bad values might propagate, assuming my reasoning about propagation of bad values is even correct.

  9. Hans Franke September 22nd, 2013 9:34 pm

    Now, if you already introduce another chip to steady drive B, why not a ’173 instead? Same chip count, added functionality. Also, if using a ’181 seams to be a tricky thing, we could replace it by a ROM :) )

    *Duck and Cover*

  10. Steve Chamberlin September 23rd, 2013 6:49 am

    @Hans, I did take another long look at adding a B register. As you say, if there has to be a chip there anyway, why not make it a register instead of a mux? I may still do that, not sure yet.

    But this is interesting: I just worked through ’181 ALU internal diagram by hand for the case where A=1111, B=XXXX, Cin=1, M=1, S=1111. This is what happens when Nibbler drives the accumulator value to the data bus. In the first input stage, there are some gates that compute A0*B0*S3 NOR A0*/B0*S2. In our case, that reduces B0 NOR /B0!

    This is exactly the kind of problem I was expecting. B0 NOR /B0 will be 0 for any valid value of B0, but if B0 is at an invalid logic level, the result will be two half-on transistors and an invalid output voltage.

    A second way to drive the accumulator value to the bus is Cin=0, M=0, S=0000. I chose the first way arbitrarily. By again working through the ’181 internal diagram, the Cin=0,M=0, S=0000 case should *not* result in any invalid voltage outputs! So it looks like if I just change the microcode to use this second way, I can safely remove the mux.

  11. Dr Jefyll September 23rd, 2013 7:46 am

    “I just worked through ’181 ALU internal diagram by hand” Cool. But isn’t it possible the conclusions you’ve gathered will be inapplicable to 181′s from other vendors and/or other logic families? That could complicate parts sourcing for others whose wish to build a Nibbler.
    Instead I like the idea of using a ROM instead of a ’181. ROMs are already on the parts list, so it’s “free” in that sense — no extra shopping. Better yet, you’re not stuck with the 181′s somewhat arcane functionality. Instead you could customize, with functions intended specifically for Nibbler.
    ( Also, a P.S.– re schmitt trigger bus drivers, these do exist, although they’re uncommon. The 74HC and HCT 7541 are examples. http://www.nxp.com/documents/data_sheet/74HC_HCT7541.pdf )
    — Jeff

  12. Steve Chamberlin September 23rd, 2013 10:08 am

    That’s a good question. Surprisingly, TI seems to be the only company that manufactures the LS181. But if another company did clone it, in order for it to be 100% compatible I believe it would have to have the same internal logic. A ROM-based ALU would definitely have some advantages, but it would be substantially slower (150ns vs about 33ns), and wouldn’t allow for eliminating the mux.

  13. Dr Jefyll September 23rd, 2013 1:56 pm

    “in order for it to be 100% compatible I believe it would have to have the same internal logic”
    Certainly all 181′s must conform to the same Truth Table. But, even so, the internal implementation might vary — especially for CMOS devices such as the Philips ‘HC181. BTW & FWIW, a quick search reveals that, besides LS and HC, the 181 also exists (or existed) in ACT, AS and F families; possibly others too.
    As for TI’s LS181, their data sheet does indeed show an internal logic diagram, but I doubt such diagrams are reliable as literal, gate-for-gate representations. Not intending to contradict you; I’m just saying it’s hard to be certain. The diagram may simplify or omit any internal details deemed to be uninformative.
    In such cases there *may* be a note. One such note I saw (I forget which chip) said the logic diagram wasn’t to be used for estimating prop delays. That’s a pretty clear indication they’ve paraphrased things — glossed over a confusing optimization, perhaps. But, to me, the *absence* of such a note is no assurance that they *haven’t* paraphrased things. Not wishing to seem negative, but I’d be uncomfortable with a fix that’s based on an assumption about TI’s LS181 logic diagram. And it leaves open the question of other vendors’ implementations.

    — Jeff

  14. Steve Chamberlin September 23rd, 2013 3:51 pm

    The proposed change worked perfectly! See http://www.bigmessowires.com/2013/09/23/invalid-logic-levels-explained/ for more details.

  15. Hans Franke September 25th, 2013 7:16 am

    grandpa story: *in the good ol’ days*

    AFAIR everyone (Fairchild, Philips, TI, …) had the same internal drawing in their data sheet. Some (most notably Siemens) even gave away their internal transistor structure. That was at a thime when data sheets wheren’t proofread by marketing to create trade secrets, but by engeneers looking for solutions.

    On ROM based ALU: yes, it would be slower – like 30-50ns. Classic 74S287 is about 30 ns total access time, even faster than the ALU, but you’d 4 of them. An atmel 27C256 (32Kx8) OTP is about 45ns access time, so not realy a low down – and due the fact that it got an OE input, you could scrap the 241 seperating the ALU out from the data bus, resulting in a unified data bus, where an IN or LD#imm does not have to go thru the ALU at all (hint, speed up possible for some operations :) , avoiding the C=A issue.

  16. Steve Chamberlin September 25th, 2013 8:07 am

    I believe the first Apple II’s also included complete schematics for the computer. Those were different days!

  17. Hans Franke September 26th, 2013 4:12 am

    They did – until the IIe – Also all ROMS (except for MS-Basic) where listed in the early Manuals for the computer itself and every I/O card.

  18. Erik Petrich September 26th, 2013 11:22 pm

    I learned my first assembly language from the Apple II hardware reference manual. It had the instruction set summary excerpted from the 6502 datasheet and also the monitor ROM source code to give me examples of how to put the instructions together to perform bigger tasks. It amazed me how one could take very simple instructions and put them together to achieve the very complex.

Leave a reply. Comments may take a few minutes to appear.