BMOW title
Floppy Emu banner

Yellowstone Glitch, Part 3: Train Wreck

This Yellowstone glitching problem has gone from puzzling to frustrating to potentially project-ending. I’m still not exactly sure what’s going wrong, let alone how to fix it, and I’ve nearly exhausted all my troubleshooting ideas. In the hopes that maybe the problem could be explained by damaged hardware, I assembled an entirely new Yellowstone card from spare parts, but it fails in exactly the same way as the original card. Grrrr.

Let’s review the facts here. In limited testing, Yellowstone appears to work great on the Apple IIe for controlling any type of disk drive: Smartport drives like the Unidisk 3.5 or Floppy Emu’s hard disk mode, dumb 3.5 inch drives like the Apple 3.5 Drive, and 5.25 inch drives. On the Apple IIgs it works for Smartport drives and 5.25 inch drives, but dumb 3.5 inch drives almost always result in a crash while booting the disk.

The immediate cause of the crash looks like this: While the computer is executing code from the Yellowstone’s onboard ROM in address range $C800-$CFFF (which is actually internal FPGA memory), the card suddenly thinks it’s been deactivated and that it’s no longer in control of that address range. So it stops outputting bytes, the CPU reads and executes random garbage bytes, and there’s a crash. Yellowstone thinks it was deactivated because it thinks the CPU put the address $CFFF on the bus, which deactivates all peripheral cards. But that’s not true.

For unknown reasons, this problem only ever occurs during a clock cycle when the CPU is reading from Yellowstone’s SRAM chip, although I don’t have any idea why that’s relevant. Nor can I explain why it only happens for 3.5 inch disk drives, and only on the Apple IIgs.

At first I thought it was a clock glitch on Q3 causing Yellowstone to incorrectly see and react to a phantom $CFFF address, and I captured some Q3 glitches with the logic analyzer. But then I started noticing other glitches on other signals, some of which appear tens of nanoseconds before the Q3 glitch. So it’s more likely that the Q3 glitch is a symptom of some other problem, rather than the root cause.

My best guess (but only a guess) is there’s a problem with the 3.3V power supply, and some transient noise on the supply is inducing glitching in multiple locations. With an oscilloscope, I observed a few instances where the 3.3V supply very briefly swung as high as 3.84V and as low as 2.76V, at the same time the data bus driver was enabled. But I’m a bit suspicious of those numbers. My oscilloscope always seems to show wild ringing on signals, no matter what project I’m working on, so I’m thinking that’s at least partly the result of my probes or the way I’m taking the measurements. I made several changes to the bus driver, including advancing and delaying the enable timing, and adding more bypass capacitors with various values, but nothing seemed to make a difference in preventing the glitches.

Without being able to clearly characterize exactly where and when the problem is occurring, my hopes for fixing it are low. I still don’t know whether what I’ve observed on the logic analyzer is the cause, or only a symptom.

The bad news is that I’m running out of ideas about what else to try. This train is headed down the wrong track, and the next stops are Frustrationville, Dead-End Town, and Abandon City.

Read 15 comments and join the conversation 

15 Comments so far

  1. Matt - June 15th, 2021 4:52 pm

    If it’s any consolation by building the second one you’ve ruled out the components you thought you might’ve fried. Is there anything to be learned by comparing the signals on the two boards? Is the 3.3 rail glitching the same way on both? Can you change the behavior by rerouting some portion of the 3.3 supply through wires rather than the board?

  2. Stephen Moody - June 16th, 2021 12:34 am

    What current rating is the 3.3V regulator you are using? If there is a dip in the voltage level when the buffer is switched it’s possible that the regulator is failing to supply the required current.

  3. Steve - June 16th, 2021 7:03 am

    It’s an 800 mA regulator, which should be more than enough. But I like your thinking… this does seem like the kind of thing that might explain it.

    It’s worth mentioning the 3.3V supply is partitioned, with most of the chips being supplied straight from the regulator, but the FPGA’s supply being isolated with a ferrite bead as recommended in the datasheet. So maybe that’s a problem? But I’ve tried adding additional capacitance on the FPGA side, as well as bypassing the ferrite bead with a small jumper wire, and neither modification seemed to help.

    The problem happens when the SRAM is outputting a byte and the ‘245 is driving that byte onto the data bus. But whether that’s the cause of the problem, or merely some factors that are correlated with the problem, I don’t know. The SRAM and ‘245 are activated simultaneously thousands of times without apparent problems, until the one time there’s a glitch. It may also depend on other factors like what address is being read and what value is being returned, or other factors I haven’t identified.

  4. Hales - June 16th, 2021 4:03 am

    (Sorry if this triple-posts, I\’ve had issues commenting on your site a few times before and I think my comments have never made it. The captcha just takes me back to the blogpost page with no message or effect. Trying clearing my cookies and enabling 3rd party stuff this time around)

    > My oscilloscope always seems to show wild ringing on signals, no matter what project I’m working on, so I’m thinking that’s at least partly the result of my probes or the way I’m taking the measurements.

    I would recommend pursuing this avenue. Your scope could be really trying to help you here, don\’t disregard its readings until you can prove otherwise.

    There is a chance your scope+probes don\’t have a flat frequency response (this can make high-freq transitions on signals look big & look like they swing around ringing a lot). Have you tried looking at the test square wave that comes out of a little hook on the front of your oscillopscope? If it\’s nice and perfectly square then you\’re good. If it\’s off (and adjusting the little screws in your probes does not help) then you will need to buy some new probes (different scopes & different probes have different compensation tuning ranges, sometimes you can\’t get them to match, or they can just be broken).

    If you suspect that your line/bus drivers are causing power supply rail ringing: have you considered adding series resistors to their outputs? Just like you often have to add series resistors to the gates of large mosfets, otherwise they ring. Each bus line may have a reasonable amount of capacitance and the output transistors may be quit low in R, leading to large current spikes. Approx 100ohms (on every bus line) is a good starting point just to see if it has an impact.

  5. Steve - June 23rd, 2021 4:46 pm

    Sorry for the delay on this comment’s appearance, Hales. I’m working on some better scope measurements, which I’ll post soon. I haven’t tried the reference square wave, but I’ll go that right now. I agree about series resistors, as is discussed in some of the later posts in this series.

  6. Steve - June 23rd, 2021 5:11 pm

    The test signal on the front on my scope is 1 kHz. All four probes looked pretty good with the test signal: maybe just slightly off. I adjusted two of them, but the other two probes don’t have any kind of adjustment screw that I can see. Edit: actually channel 3 (pink) is pretty far off from square, but it’s one of the probes without an adjustment screw.

  7. Stephen Moody - June 16th, 2021 7:36 am

    I would have thought that 800ma would be more than enough for something like this.

    Powering the FPGA through the ferrite would be fine. It can be hard to work out how much power an FPGA will require but the development tools can often give you an estimate. If you have a bench supply handy then it could be worth powering the 3.3V on the board from there, should make sure that the voltage is steady and can rule out that as an issue.

    These sorts of issues can be tricky to track down. I don’t know anything about the bus signals on the Apple, but I would look at the clock lines on the logic analyser at the same time if you can and see if there’s any correlation between the glitches and the clock edges. With a clock of 7Mhz I wouldn’t propagation time on the buffers would be a major issue, but it could be worth checking the timing on those as well.

  8. Steve - June 16th, 2021 9:04 am

    I was wondering about using a bench supply for 3.3 volts, as a test. Do you think it’s safe to connect my bench supply to the card’s 3.3V rail, when the card’s voltage regulator is also driving that rail? They’ll both be outputting the same nominal voltage, but if there’s ever a few hundred millivolts of difference, I don’t want to damage the voltage regulator or the bench supply by back-powering them.

  9. Stephen Moody - June 16th, 2021 9:14 am

    It should be safe to connect the two supplies in parallel like that, I would power up the bench supply after the computer to be safe though. If the problem is on the 3.3v supply then doing that should resolve the issue.

  10. Steve - June 16th, 2021 11:25 am

    Unfortunately the bench supply at 3.3V didn’t help. But I took a closer look at what addresses and values are being read from RAM when it glitches, and found something interesting.

    The value that’s being read is usually FF. I also saw F7, CF, and 03. Maybe it’s a coincidence, but except for that last value, it looks like output bytes with many 1 bits are associated with the problem. The address that’s being read also doesn’t seem random: several times it glitched while reading FF from address C825. Maybe that’s just where FF is normally stored in the buffer, or maybe there’s something significant about that address. I also saw glitches while reading from C805, C82A, C907, CB02, and others, so I’m not sure the address is significant.

    I was actually able to reproduce the glitch several times without booting a disk, just by using the Apple II monitor to write FF to C825 and then reading it back. It’s not reliably reproducible though. It happened three of the first five times I tried it, but then for some reason it became much more rare.

  11. Steve - June 16th, 2021 12:46 pm

    Getting closer to an explanation. I can reproduce the problem without any disk drive even connected, just by using the Apple II monitor to list the contents of the Yellowstone card’s RAM.

  12. Jeff - June 16th, 2021 3:11 pm

    This sounds like a classic signal integrity problem. The resistor you mentioned in a previous post is a little large, but usually these are there to series terminate the signal. That is, to stop it ringing.

    What I suspect is not your power supply, but ground return problems. Ask yourself: “What is the driver for this signal, and how does the signal current get back to its power supply ground?” If that is a high impedance path at high frequency (e.g. long thin traces), then there will be a surprisingly large voltage induced in the ground. That is also likely why your oscilloscope is always showing you nasty ringing… the probe ground lead is resonating, because it’s trying to return the signal current to some random place on the board.

    A couple suggestions:

    1. Make a low Z probe and look again. Get a section of 50ohm cable and a 950ohm (or 1k) resistor. Solder that resistor to the centre conductor short… less than 1cm of lead. Then solder that directly to the signal pin you’re trying to measure. Take the braid into a short piece of wire, less than 2cm, right to ground on the same chip. Terminate this coax at your oscilloscope in 50ohm and have a look. Doesn’t need to be fancy, and will get you out to at least 500MHz. Google comes up with this random example http://paulorenato.com/index.php/electronics-diy/93-praise-for-the-lo-z-probe

    2. Make sure there are no latches or async logic in your FPGA. I know, that’s a pain… but it’s low cost. Make sure you’ve got fully sync logic in there, and any clock domain crossings are handled with appropriate metastable flops.

  13. Jerry - June 16th, 2021 8:04 pm

    Some FPGAs have slew rate settings on the output pins. If you can configure them, configure them to all slow slew rate. That’ll help if it’s ground bounce.

    On one prior design I had to stagger the address and data lines by a clock so they weren’t all changing at once.

    Is this a 4 layer board or a 2 layer board? Your pics imply it’s only a 2 layer board. Ground planes can help a lot with bounce.

  14. Steve - June 16th, 2021 9:37 pm

    Thanks for the ideas. It does seem like a signal integrity problem. I’ve seen designs for DIY scope probes like those before, but I think I’m missing something. If that design is superior, and simple to build, why aren’t normal scope probes built that way? What’s the trade-off?

    This is a 2-layer board. I don’t have any experience with 4-layer boards, and my software doesn’t support it. But there has to be a first time for everything, if I’m forced to revise the board. I do make an effort to keep ground traces as short and fat as possible, and also use ground fills. But I know it’s not the same as having a ground plane.

    I will double check the output slew rates on the FPGA. I think they’re already configured for slow. But almost all of the relevant signals here are FPGA inputs coming from the Apple II, not FPGA outputs.

    I’ve posted a Part 4 of this glitching saga, with some new information here: https://www.bigmessowires.com/2021/06/16/yellowstone-glitch-part-4-the-plot-thickens/

  15. Jeff - June 16th, 2021 10:30 pm

    “why aren’t normal scope probes built that way? What’s the trade-off?”. Yes, that seems to be what the common question is, I’ve had many engineers assume it’s an amateur folly. The answer is that probe presents 1k impedance to the circuit under test (loading it 1mA / volt), and is DC coupled 20:1. In return, you get as near perfect response as you can from probing a very high speed low voltage digital signal. It’ll work on a lot of 50ohm RF circuits also, perturbing a low Z circuit very little. It’s not better, it’s better for some use cases, like this one. If you’re looking at pretty much any other small signal analog circuit, this probe guarantees you’ll just load it so it stops it working.

    I has assumed the problem here is mostly computer-fpga. You might try some SMD series ‘stopper’ resistors placed on the traces (just cut and scrape the solder mask away) to series terminate, and see if that affects things. Your trace impedance might be around 150ohm or so for 2 layer, doesn’t have to be a perfect match so maybe start there.

    One more suggestion: Copper foil tape (maybe with some Kapton tape under it) can get you probably as close as possible to a low Z ground plane to see what happens.

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.