Thoughts on Floppy Emu Redesign

April 06th, 2018 | Category: Floppy Emu | Author: Steve

I’ve been pondering what a redesigned Floppy Emu might look like – what ICs might be involved. So why mess with success? The current design combines a microcontroller with a CPLD, which has proven to be a powerful and flexible combination. But the specific microcontroller and CPLD that I chose have both become a bit outdated, hard to find, and too expensive relative to the alternatives. By replacing them both with more modern parts, I could probably gain better features while simultaneously improving the manufacturing outlook, for a perfect win/win. Or with a sufficiently powerful microcontroller, I might be able to completely eliminate the CPLD.

CPLD

The CPLD (complex programmable logic device) is like a miniature FPGA, and is there to handle the timing-critical bit twiddling that the microcontroller can’t. It ensures the data bitstream appears at exactly the right speed, with no jitter, and also functions as a fancy parallel-to-serial shift register. For Apple II disk emulation, it helps make the work easier, and for Macintosh disk emulation it’s essential. The Mac treats floppy drives sort of like a 16×1-bit external memory, and microcontroller software isn’t fast enough to react to the changing address inputs and supply the correct data output.

The specific CPLD used by Floppy Emu is the Xilinx XC9572XL, which I mainly chose for being 5v tolerant. It works, but it can only store a very small amount of logic, which forces me to have separate and independent firmwares for Apple II and Macintosh emulation. With a larger and more capable chip, I could merge those into a single firmware.

The Xilinx XC9572XL is old enough that it’s approaching “legacy” status, and may be in line for discontinuation soon. Xilinx barely mentions it on their web site, and the chip has been sold out at all of their distributors for the past several weeks, which is worrisome. During the years I’ve been building Floppy Emus, the price of the XC9572XL has also steadily climbed to where it’s now more than double its original cost.

There aren’t many great alternatives, because it seems the whole CPLD market is slowly dying in favor of FPGAs, and most of what’s left has even smaller logic storage limits than the XC9572XL. The best option is probably a small FPGA instead, like one of the Lattice MachXO or MachXO2 devices similar to what’s in the Yellowstone disk controller.

A second idea is to replace the CPLD with a small, dedicated microcontroller, like something from the ATTINY series. With only one or two very simple tasks to do, maybe a mini-microcontroller would be able to keep up. I spent half a day pursuing this line of thinking, going as far as writing a first draft of the code for this hypothetical microcontroller, but decided to focus on other solutions first.

Microcontroller

Floppy Emu uses an ATMEGA1284 microcontroller, which in 2018 is almost a joke. Its sole advantage is that it’s a close relative of the ATMEGA chip used in Arduinos, so there’s tons of example code available for it. But it’s woefully underpowered compared to basically any other microcontroller out there. Any of the popular 32-bit ARM microcontrollers would have significantly faster clock speeds, more memory, and better peripheral options, and would probably cost less too.

Sadly there are still no common microcontrollers with enough internal RAM to buffer an entire floppy disk image (140K to 1440K depending on the image type). That would vastly simplify the emulation, help fix some occasional emulation hiccups, and possibly lead to better handling of copy-protected disks. But that will have to wait until the 2020’s, it seems.

After looking at a whole range of options, I’ve got my eye on the Atmel SAM4S series of microcontrollers. These boast speeds up to 120 MHz, while remaining relatively inexpensive.

Single-Chip Design?

Earlier I mentioned that the CPLD is essential for certain types of disk emulation, where microcontroller software isn’t fast enough to react to changing inputs in real time. That’s true for an older 8-bit microcontroller like the ATMEGA1284 running at 20 MHz, but what about for a modern 32-bit ARM running at 120 MHz? That might be a different story.

The hardest test will be 3.5 inch floppy disk emulation for the Macintosh. When the Mac wants to check some item of drive state, like whether there’s a disk inserted or whether the disk head is at track zero, it sets four IO lines with the right values to select from among the 16 possible state bits. Then it reads the result on a fifth IO line. It works just like an asynchronous SRAM, and the drive (or Floppy Emu emulating a drive) never knows if or when the Mac is reading state info. The drive needs to constantly respond, so whenever one of those four IO lines changes, the value on the fifth IO line must be updated immediately to reflect the newly selected state.

This would be difficult to emulate in a microcontroller, especially if it also needed to do other work besides merely responding to changing address inputs. The microcontroller would need to enable a pin change interrupt on each of the four address lines, and the interrupt service routine would need to read the address lines and set the data line accordingly. In practice it would be even more complicated, because there are other IO lines like /ENABLE that would also need to be considered, and would need their own interrupt handlers.

Would it be fast enough? Maybe. By itself the code in the ISR would almost surely be fast enough. But there’s overhead to consider – the interrupt service latency, and the time needed to set up the ISR context and exit it again (saving and loading all the registers from the stack). Then there’s the complicating factor of other potential interrupts, or other invocations of this same interrupt, creating additional delays before the ISR actually starts running. And the further complication of some critical code sections where interrupts need to be disabled temporarily, creating yet more delays before the ISR runs.

How fast does it need to be, anyway? How much time elapses between when the Mac sets the four IO lines and when it reads the fifth line? Unfortunately I don’t know, and the worst part is there’s really no way to find out. Because there’s no external indication of when a read is occurring, there’s nothing I can measure with an oscilloscope or logic analyzer. But I can make some rough guesses, based upon examination of a disassembly of the floppy disk driver code in the Macintosh ROM. On the Macintosh Plus, there’s a whopping 6750 ns between address and data for most state reads, but some reads have a much narrower window of about 1500 ns, assuming I’ve interpreted the code correctly.

Are there other faster read examples, buried somewhere in the Mac Plus ROM that I missed? And do Macs with a higher clock speed than the Plus have correspondingly faster read behaviors of disk state info, or do they insert delay code to keep the speed constant? I don’t know, but the Mac Plus is likely far from the worst case. I’ll make a wild guess that a 500 ns response time would probably be fast enough. Keep in mind I’m talking about the speed between presentation of the four address IO lines, and reading state info on the fifth IO line, which is entirely different from the speed or data rate of the disk data itself. It’s not something that’s specced or defined in any official source; it’s just something that arises as a side-effect of the code in the Mac’s ROM.

Despite the uncertainty here, I think there’s a decent chance this could work, so I’m going to try some experiments to test my theories.

If even a 120 MHz microcontroller isn’t fast enough to handle this 4-address-1-data mechanism, I’ve sketched out a possible fallback plan that uses a few discrete logic chips – some muxes and latches and buffers. It’s arguable whether that would be preferable to just using a CPLD (or FPGA), and it would certainly be less flexible. But a small handful of discrete logic combined with a much faster microcontroller could still provide some major advantages, and simplify firmware design by reducing the number of programmable chips on the Emu board from two to one.

So there’s a lot to think about. And anything that grows out of all this ruminating won’t see the light of day for a long time. But I can’t resist daydreaming…

Read 19 comments and join the conversation

19 Comments so far

Alex - April 6th, 2018 10:06 pm

Just a couple of thoughts, based on some chips I have used in the past.
The Greenpak chips are fairly cheap (often cheaper than a handful of gates), 5V (and even levelshiftable in some versions), and the Greenpak 5 variants have a hard I2C port for programming and control. I use one as a fully reconfigurable LED and interrupt controller during CPU stop modes, but they are fairly flexible. Note that the OTP versions can be run time programmed on boot just fine even on a blank unit. The greenpak 4 devices have an open source toolchain, but I am not familiar with that family.
Alternately, in your secondary MCU option, some small little 8-bit CPUs have a logic array bolted on. For example the attiny414 and others from the same family ‘214,814,’817, etc. I think it’s also something some PIC parts have but I haven’t touch one of those in years. They are cheap and 5V.
Steve - April 7th, 2018 6:00 am

The attiny414 looks very interesting. I had no idea that family had parts with configurable logic! It’s maybe too small and simple, but I’ll check it out, along with the Greenpak. Another microcontroller+logic option is the Cypress PSoC family, but I’d been intentionally avoiding that one due to its tools and general air of unpopularity. I’m tired of using semi-obscure chips that are hard to find and then get discontinued.
Jim Mc - April 7th, 2018 6:08 am

Love my FloppyEmu. It’s invaluable for my Apple collection, however it desperately needs a more stable & larger LCD screen and/or a composite or HDMI output and ROM switching without the need to re-flash between Apple II/Mac.

Thanks for the effort you’re putting in to the retro-computing community. It’s really appreciated!
Alex - April 7th, 2018 8:03 am

Right, I completely forgot about the PSoC stuff. As somebody who has used them at work (PSoC 4 series), it’s not the worst idea to stay away from them, when they work and fit your project perfectly they can be awesome, but if they don’t you will have a far harder time figuring out why something that the GUI tools claim works, reviewing ll the docs claim works, isn’t in the errata, and just plain does something wacky. I want to like them, I really do, but they have cost us a disproportionate amount of time on the simple things that should have just worked. And other times they did everything perfectly.
And yes, the tools do suck more than the average, although I have used worse.

On the Greenpak stuff, I can’t speak to after Silego was bought by Dialog, but before then, they were helpful and responsive and their direct sales (of low quantity blank parts) was easy to work with. The tools are Mac/Win/Linux.
Andrew - April 7th, 2018 9:27 am

On the RAM capacity front w.r.t. simplifying software programming – several of the higher-powered STM32 series ARM chips have a “Flexible Static Memory Controller (FSMC)” that supports external SDRAM – and a 16Mbit (2MByte) 16-bit SDRAM is a buck a chip on digikey. STM32F412 might be a good starting point? about a four-dollar chip… not quite breaking the bank, about the same price as the CPLD you’re using?
Dillon Nichols - April 7th, 2018 9:46 am

You might also want to check out the Intel MAX10 family as an alternative to the LatticeXO2 chips. The Max 2 is an alternative to the XO chips but they are probably going to stop being supported soon. Lattice promises to make the XO and XO2 chips for 10 more years from today so you won’t have to replace hem any time soon.

You might also want to check out the ESP32. It has 2 cores and flexible high speed logic (not programmable logic but you can play some nice tricks with it) and very cost effective and they’re available on digikey. It also has Bluetooth and wifi so you might be able to find a way to transfer discs wirelessly if you want.
kurth - April 7th, 2018 2:45 pm

Both the PIC32MZ and STM32L4+ have parts with 640K of SRAM. The L4+ parts run at 120 MHz, but the PIC32 runs faster. I tend to prefer the ARM toolchain.
Steve - April 8th, 2018 4:56 pm

How fast does it need to be, anyway? How much time elapses between when the Mac sets the four IO lines and when it reads the fifth line? Unfortunately I don’t know, and the worst part is there’s really no way to find out.

I may have found an answer, in the form of these timing diagrams from the Apple 1.44MB superdrive controller chip: https://www.bigmessowires.com/wp-content/uploads/2018/04/Superdrive-controller-spec.pdf

It looks like my guess of 500 ns was exactly right! And in reality there’s probably more leeway than that. The spec says the superdrive chip will respond within at most 500 ns, that means the Mac software can’t begin to read the result until at least 500 ns. I’m going to assume the Mac’s disk driver code actually allows something more than 500 ns, to provide some margin of error. And this is the timing for the Superdrive; the original 400K/800K drive controller may have been slower, so the Mac disk driver code might be limited to that slower speed. Slower is good – it will give the microcontroller a better chance to respond with the correct data within the allowed time.

So can I write an all-software implementation, with an interrupt handler that’s guaranteed to respond to inputs with the correct data within 500 ns? I don’t know, but I’m guessing the answer is yes. Especially if I can also leverage some extra hardware on the microcontroller like an event system or configurable logic. I discovered that the Atmel SAML21 has a small configurable logic peripheral, so that may be a good place to start, even though its max clock speed is a low-ish 48 MHz.
cybernesto - April 9th, 2018 12:56 am

Have you thought about using a parallax Propeller? https://www.parallax.com/microcontrollers/propeller
You could dedicate a COG for the time critical IO lines.
John Burton - April 9th, 2018 3:54 am

Could you go the other way and use a small fpga instead of a microcontroller to handle much more of the logic and run a tiny soft CPU on it for controlling it and not have a separate microcontroller?
- April 9th, 2018 11:12 am

There are also many SoCs out there with more than one core, for example a Cortex-M0+ and a Cortex-M3. If you had one core to just handle the timing-critical stuff, you might get away without using any interrupts at all, which should reduce the latency a lot.
Steve - April 9th, 2018 12:13 pm

What do you mean by more than one core? I’m not aware of any multi-core Cortex MCUs, especially not in this price range. Cortex-M0+ is a single core, no? See https://community.cypress.com/docs/DOC-10652
Alex - April 9th, 2018 3:44 pm

As an example of parts with more than one core, the LPC4300 series parts are the first that come to mind without searching. Digi-key also lists the LPC54100 series as a cheaper and slower family. Ex. LPC54114J256BD64QL
I have never used them so cannot speak to their merits.
- April 9th, 2018 9:07 pm

Yes, I was thinking of the LPC4300 series. Mostly because I’ve played around with a rad1o (https://rad1o.badge.events.ccc.de/ , unfortunately has a broken SSL certificate), which has a LPC4330.

At 204 MHz, you get more than 100 cycles during the 500 ns window. If all the data that the Apple could possibly ask for is already precomputed by the other core, it should be easy to meet that timing 🙂
Joel Dillon - April 12th, 2018 6:00 am

I’ve not actually ever used one, but people say ‘multicore microcontroller’ and these come to mind –

https://www.parallax.com/microcontrollers/propeller
Jon - July 14th, 2018 11:27 pm

I recently sorted through my vintage computers, and read about the Floppy Emu and was going to buy one – but I then read about the Apple Sauce and the .woz format, and decided I really needed something that could support .woz images. How soon do you think before Floppy Emu will support the .woz format?
Steve - July 15th, 2018 7:34 am

The .woz format isn’t well-suited for hardware disk replacements like Floppy Emu, and there are no current plans to add it. It represents copy-protected Apple II disk data in a substantially different way than other disk formats, and requires preloading the entire disk image into RAM.
xot - September 5th, 2018 1:22 pm

Ah, well, I guess that’s my question answered. I’d love to see .woz support but understand if it isn’t feasible.
Paul Bruner - December 9th, 2025 4:21 am

This is a bit late but for a single chip solution you could look at the psoc5. It can have separate voltage domains (5v for one port, 3.3 for another), it has hardware blocks with built in state machines so you could translate the GCR on the fly as well as DPLL sync. There was an old floppy emulator somewhere on GitHub about it but I haven’t looked at it in years.

Thoughts on Floppy Emu Redesign

19 Comments so far

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.