Last October, Plus Too first booted successfully into the Macintosh Finder. Ever since then, it’s exhibited an intermittent mouse freezing bug. The FPGA Mac replica runs normally for a few minutes, during which the mouse works normally, and it’s possible to exercise menus, run programs, and do everything else you’d expect from a working Mac. But somewhere after a minute or two of activity, the mouse pointer invariably freezes in one spot, and the computer seems to halt. The bug appears to be related to mouse movements, and faster, more frequent mouse movements cause the problem to appear sooner. If the mouse remains unmoved, then Plus Too will happily run for hours without problems.
In October I was already tired from the work needed to get Plus Too to that point, and had no desire to chase the mouse freeze problem further at that time. The project sat idle while I turned my attention to other things, and saw now further progress until this week. That’s when I decided it was finally time to track down the cause of the mouse freeze bug.
Macintosh mouse handling requires two different interrupts in order to work correctly. When the user moves the mouse, the SCC triggers a level 2 CPU interrupt to read the new position data. The interrupt handler adjusts a low memory global variable called MTemp to set the new on-screen mouse pointer position. Then every 1/60th of a second during the VBLANK interval (video retrace), the VIA triggers a level 1 CPU interrupt. The VBLANK interrupt handler erases the on-screen mouse pointer from its old position, and redraws it at the new position indicated by MTemp.
When the Plus Too mouse froze, I found that the level 2 SCC interrupt was still getting called normally, and MTemp was being adjusted correctly. However, the level 1 cursor VBL task was not getting called, so the mouse pointer was never redrawn at the new position. Further investigation showed that no other VBL tasks were getting called either. In fact, no level 1 VIA interrupts of any type were being processed. At first I thought this might be a problem with the Verilog code that implements my VIA replica, but I found that the VIA was still asserting its IRQ line, but the CPU was just ignoring it. Why?
According to the CPU status register, when a mouse freeze occurs, the CPU is permanently stuck with its current interrupt priority level at 1, instead of its normal value of 0. Because interrupts equal to or below the current IPL will be ignored, no VIA interrupts are ever processed, so the mouse VBL task never gets called. Level 2 SCC interrupts can still pre-empt the CPU, so MTemp gets updated correctly, but when the level 2 interrupt handler completes it returns to whatever the CPU was previously doing at level 1.
Stuck at Interrupt Priority Level 1
So how might the CPU get stuck at IPL 1? How does it get to IPL 1 in the first place? The normal way IPL 1 is reached is during a level 1 VIA interrupt handler, when the CPU sets the IPL automatically. These handlers normally do some processing and then return, which automatically restores the IPL to 0. This means one way the CPU could get stuck at IPL 1 would be if a level 1 interrupt handler went into an infinite loop and never returned. Looking at the level 1 interrupt handlers in the Mac Plus ROM, there are:
- One Second timer – From inspecting the code, this is a trivial handler, and will always return.
- VBLANK – This handler explicitly sets the IPL back to 0, so it can be pre-empted by other level 1 interrupts.
- Timer 1 and keyboard – I haven’t implemented these interrupts in the VIA yet, so they can never occur.
- Timer 2 – This is the only VIA interrupt yet implemented whose ROM handler might conceivably fail to return.
- System handlers – After booting the Mac, the system software might install new VIA interrupt handlers or patch the ones in ROM, creating additional opportunities for handlers that don’t return. Unfortunately I have no good way to test that further.
In addition to a non-returning level 1 interrupt handler, the other way the CPU could get stuck at IPL 1 is if some code explicitly sets the IPL to 1. From looking at a disassembly of the ROM code, several routines definitely do this when modifying global lists: VInstall, PostEvent, OSEventAvail, FlushEvents. The Sony floppy driver also explicitly sets the IPL to 1 in at least two cases. There are also many examples in the ROM code where the IPL is set using a value passed in a register or on the stack, where I can’t say for certain what value it’s being set to. And as before, the system software loaded from disk might contain additional code that directly manipulates the IPL, which I wouldn’t see in the ROM disassembly.
The best way to determine what’s happening would be to wait until the mouse freezes, then pause the CPU when it’s stuck at IPL 1, and look at what code it’s executing. I’ve attempted to do just that, but I lack good tools for software debugging (as opposed to debugging the Verilog hardware model), and I haven’t been able to learn anything very useful. Whenever I interrupt the CPU, it’s either executing some system code in RAM that was loaded from disk, or some fairly innocuous piece of ROM code like the trap dispatcher. I’ve been able to determine any higher level purpose to the code that suggests what it’s trying to do or why it never exits IPL 1.
Finding a Fix
One path might be to add MacsBug to my system disk image, then invoke it when the mouse freeze occurrs, and examine the stack trace and disassembly in an attempt to learn more. MacsBug requires the use of a keyboard, though, and I haven’t yet implemented the keyboard hardware. Even if the keyboard worked, I’m reluctant to start into debugging random pieces of system software that I know nothing about, but maybe that’s unavoidable.
Another possibility is to determine what was the most recent time the IPL was changed from 0 to 1. That might not be enough information to solve the problem, but it would be a start. I might be able to find that info using Altera’s Signal Tap logic analyzer, or maybe I could modify the Verilog machine model to keep track of the IPL changes for me.
My hunch is that some piece of code is going into an infinite loop while trying to access a piece of hardware I haven’t yet implemented, like VIA timer 1, the keyboard, serial port, sound hardware, or PRAM. If all else fails, I could just keep adding more hardware to my Verilog model, and see if the mouse freeze problem disappears at some point. One intriguing clue is that the mouse problem is much more difficult to reproduce when the General control panel is in the foreground. This control panel sets the date and time, sound volume, and other settings that are stored in PRAM. With PRAM not yet implemented, the control panel behaves oddly, and the system time never advances beyond 12:00:00 AM. Perhaps the General control panel is constantly attempting to read or write PRAM, which somehow affects the likelihood of the mouse freeze bug occurring? It’s little more than a wild guess, but PRAM is as good a place as any to start implementing more hardware.
One thing I can’t explain is why frequent rapid mouse movements appear to cause the freeze problem, since my investigations suggest the frozen mouse pointer is merely a symptom of VIA interrupts not getting processed, rather than a cause of anything. Since mouse movements generate a level 2 SCC interrupt, maybe there’s a bug in my design that occurs when a level 2 interrupt pre-empts a level 1 interrupt under certain conditions, or when both interrupts are triggered at the same time. There are some bugs in my mouse implementation as well, which appear to cause a backlog of mouse updates under some situations. I’d assumed these were unrelated to the freezing problem, but maybe I should try getting to the problem of that first. I wish I had a clearer idea of how to proceed, instead of just clutching at straws!8 comments