BMOW title
Floppy Emu banner

Yellowstone 3.5 Inch Drive Support!

Success! My Yellowstone disk controller card for the Apple II now works with 3.5 inch floppy drives! Along with the previously-implemented 5.25 inch floppy and Smartport HD support, this completes the triumvirate of Apple II disk drives. While it’s still very rough around the edges, I now have a working universal disk controller for Apple II that can handle any type of disk drive. This is exciting, because existing disk controllers for two of the three drive types are rare, expensive, or both. After a very long period of slow progress, I feel that everything’s finally starting to come together.

I’m especially pleased to see 3.5 inch floppies working, because 1 MHz Apple II machines like my Apple IIe theoretically aren’t fast enough to keep up with the higher bit rate of 3.5 inch disks. There’s not enough time for the CPU to poll for a new byte, store it, and get ready for the next byte before it’s already passed by. The official 3.5 inch disk controller card from Apple solves this problem by placing an entire second computer on the disk controller, with its own 2 MHz 6502 CPU, RAM, and ROM. But Yellowstone uses some Very Special Tricks in hardware to achieve 3.5 inch floppy support on the 1 MHz CPU. It borrows a technique from the UDC disk controller, and forces the computer’s READY signal low to halt the CPU until a new disk byte is ready. This eliminates the need for software polling, and shaves just enough cycles to make everything work.

So now what? This is just the beginning; a proof of concept more than a finished project. I’ve only done the most cursory testing, and I’m sure there are many compatibility problems still to address, and devilish bugs to find and fix. I know about a few of them already. Here’s some of my planned testing:

  • Test all disk types with more thorough read and write tests
  • Try all the copy-protected 5.25 inch disks that rely on weird behaviors
  • Test formatting disks
  • Test with other cards installed in every slot
  • Test with an NMOS 6502 CPU
  • Boot from another disk controller and access Yellowstone as a secondary
  • Use with Apple IIGS at fast speed
  • Boot GSOS from 3.5 inch floppy and Smartport HD

Yay for testing. It’s important, but not very fun, and I may enlist some beta testers to help. The other big task still ahead is to design the second hardware version of the Yellowstone card. The current prototype has sprouted extra patch wires and even an extra chip glued to the board, but it’s still missing some key features. Here are some of the items on my to-do list:

  • Remove SPI ROM
  • Add external SRAM
  • Add a second disk connector
  • Add a switch to select between 2-port operation or daisy-chain 1-port
  • Buffer to isolate FPGA from disk signals
  • Connect Q3 and IOSTROBE to FPGA clock pins
  • Connect the upper 4 address lines
  • Connect RDY and PHI1
  • Add output buffer for RDY
  • Make better / wider GND connections
  • Improve the bypass capacitors
  • Allowance for in-circuit JTAG / SPI reprogramming
  • Allowance for self-test or external test
  • Add open-drain buffers or inline resistors for disk signals with multiple drivers
  • Label all the unlabeled pins and ports on the card
  • Switch to a bigger LDO voltage regulator
  • Add more power and ground test points
  • Round the corners on PCB

That ought to keep me busy for a while.

Be the first to comment! 

Yellowstone 5.25 Inch and Smartport Support

Way back in 2017, I began development of Yellowstone, an FPGA-based disk controller card for the Apple II. It’s hard to believe it’s been almost four years. Sometimes I feel like I’m moving inch by inch along a journey of a thousand miles. But today I made some significant progress, and a milestone of sorts.

My earlier efforts were focused on duplicating the functionality of the Liron disk controller card (for Unidisk 3.5 and Smartport hard drives), and separately on the Disk II controller card for 5.25 inch drives. More recently I’ve been working on mirroring the functionality of the UDC disk controller, because it supports all of those types of drives as well as standard 3.5 inch drives, and it can automatically detect drive types. Today’s good news: when configured as a UDC, Yellowstone is now able to auto-detect, read, and write Unidisk, Smartport, and 5.25 inch drives! That’s basically everything the UDC offers except for 3.5 inch drive support, which is my next goal. It may sound like I’ve merely duplicated what I had before, but the UDC works quite differently from other disk controllers, and reaching this point means I’m getting closer to unlocking the full potential of a “universal” disk controller.

I’ve discovered some interesting things along the way. Many readers probably know that the Disk II controller card was Apple’s first disk controller, designed by WOZ and built entirely from simple off-the-shelf parts. Later Apple developed the IWM or “Integrated Wozniak Machine”, which did everything the Disk II controller could do, all in a single chip. But the IWM wasn’t just a replacement for the Disk II circuitry; it modified and extended it in a backwards-compatible way. The differences are subtle, but important. For example, if you write a byte to the Disk II controller while the disk is turned off, it has no effect. But if you write a byte to the IWM while the disk is turned off, it updates a configuration register that controls some interesting extra features.

I had been working towards creating an accurate IWM model, but what I discovered is that the code in the UDC’s ROM doesn’t really work with an IWM. It expects to be paired with some custom logic whose behavior is much closer to the original Disk II controller. It does strange-seeming things, like reading and writing from Smartport hard drives without ever asserting the drive’s enable signal. And it leaves a few other mysterious behaviors where I must guess at what’s intended. I’ve had to modify my Verilog design, breaking some of the IWM behaviors in order to match the UDC expectations. That makes me a little uneasy; I don’t really want to maintain two different designs. Fortunately the differences aren’t extensive.

3.5 inch drive support is the next step, but there’s so much to do beyond that. The real UDC card supports two independent disk connectors, but Yellowstone only has one. Some later versions of the UDC also support daisy-chaining drives, which I’d love to get working for Yellowstone too. Unfortunately there are UDC versions with Smartport support, and versions with daisy-chaining support, but none with both. Combining the two may be a major challenge.

After Yellowstone is functionally complete, there will still be plenty more work to do. I need to redesign the board to better isolate the FPGA from any 5V signals, and generally make it as robust and foolproof as possible. I need to decide what I’m doing about the required DB-19 female connectors, which are stupidly difficult to find, though I do have some. I need to revisit all that ROM packing stuff I described here recently, to see if I can’t squeeze everything into a more common FPGA with a lower cost. I probably need to build some kind of end-user reprogramming capability too, to allow for bug fixes or new features. I’m using a JTAG programmer and the Lattice development software, but that’s not a user-friendly solution. That could become a major project in its own right. And finally I need to design a self-test capability, or an external testing rig, that can be used to verify large numbers of boards (relatively speaking) after they’ve been assembled. At the rate I’m going, I’ll be busy for a long time!

Read 1 comment and join the conversation 

Yellowstone Progress Update

I’m still working on development of an FPGA-based disk controller card for the Apple II – Yellowstone. Over the past couple of months, I spent a long while analyzing the design of the UDC disk controller. The UDC supports all three major types of Apple II disk drives, making it a promising place to begin learning. After that I spent a long while more exploring how I might squeeze the UDC’s 8K of ROM and 2K of RAM into the limited resources of Yellowstone’s FPGA. Just recently I finally finished up those investigations and returned to actively building and testing the Yellowstone card. Unfortunately it still doesn’t work.

I built a second Yellowstone prototype, identical to the original except for selecting a Lattice MachXO2-2000 FPGA instead of a MachXO2-1200. This new chip is just barely large enough to hold the necessary ROM and RAM for my UDC pseudo-work-alike Verilog code. I’m not sure if I’ll use this solution for the final edition of Yellowstone, or if I’ll use a smaller MachXO2 version paired with a separate ROM or RAM, but at least I’m up and running again.

The card seems to work as expected when I probe its memory space from the Apple II monitor. I can access all 8K of ROM via its custom bank-switching logic, and its 2K of RAM also through bank switching. I can probe its soft-IWM and watch the disk I/O lines change. Everything looks OK. But when I try to actually boot a 5.25 inch disk, it just freezes the computer.

It’s not completely dead; it does do *something*. The disk drive turns on and spins. Using a logic analyzer, I can see some brief activity on the disk I/O lines that I interpret as “hello, are you a 3.5 inch drive?” before it goes silent. If I then reset the Apple II and examine some memory locations where I know the UDC store status info, I can see that it detected one disk drive. But why didn’t it boot? More importantly, why did it freeze?

If this were a normal software program, I could use a debugger to interrupt the program and see where it’s frozen. That alone might be enough to reveal what’s wrong. If not, I could restart the program from the beginning, and step through it line-by-line until I found the problem. But nothing like that is possible here. There’s no facility for Apple II breakpoints or single-stepping through code that’s in ROM, and even if there were, the I/O code is timing-dependent and would likely break when run in the debugger. The poor man’s debugger is printed log messages, flashing LEDs, and similar indicators, but even that will be difficult. I can’t easily add or edit code in the UDC ROM, because it contains lots of absolute address references as well as implicit assumptions about certain chunks of code and data avoiding page boundaries.

I wish I still had my old HP 1631D logic analyzer. Then I could hook up 24 probes to the Apple II’s address bus and data bus and then let the computer run, examining the logged CPU cycles afterwards using the HP’s state listing view. My Saleae logic analyzer is nice for many tasks, but even if it had 24 probes, it’s basically only designed for timing / waveform views. I guess not many people look at parallel busses anymore.

Read 12 comments and join the conversation 

Floppy Emu Update: WOZ Writes, Dual 5.25 Emulation, and More!

Today I’m excited to introduce a major new feature update for the BMOW Floppy Emu Disk Emulator. The latest firmware for use with Apple II computers adds the three most frequently requested new features: writeable WOZ and NIB disk images, formattable disk images, and dual 5.25 inch floppy drive emulation. Each of these three was tricky to implement, and I’ve lost track of how many times I said these things were too complex or the hardware couldn’t support it. Forget all that, because here it is! This is the biggest set of changes to the firmware in the whole history of the Floppy Emu, requiring some extensive rework under the hood. I hope the results are worth it.

The new firmware is version 0.2Q-F29. You can download the latest firmware here: firmware

WOZ and NIB disk images are now writeable!

This is great for copy-protected programs that need to save some information to the disk, like your Print Shop printer settings or your Castle Wolfenstein game progress. While the sector-oriented disk image types like .PO, .DO, and .DSK have supported writes since day one, the bitstream disk images (WOZ and NIB) typically used for copy-protected programs were previously treated as read-only by the emulation engine. No more. Your Wolfenstein games will now be safely saved.

Getting this to work was challenging, because the Floppy Emu wasn’t originally designed to handle raw bitstreams. A few simplifying assumptions were made in order to bridge the gap. It’s possible a few programs may cause trouble if they perform writes in a very non-standard way, though I haven’t yet found any real-world examples. The firmware doesn’t make assumptions about sector sizes or layouts or headers, so it should work OK even if the written data doesn’t look like standard DOS/ProDOS sectors. For example, Wolfenstein writes sectors whose data header begins with the bytes D5 DA AD instead of the normal D5 AA AD.

WOZ disk images contain a read-only flag in their header. Writing to the disk image will be disabled if this flag is set.

You can format 5.25 inch disks!

This is great when making disk copies, or if you need to create a save disk from within a game like King’s Quest. In-emulator disk formatting makes life much more convenient when copying or saving data. I have striven to make formatting a feature that “just works” as much as possible, but there are a few important details to know.

Successful formatting is dependent on the write caching behavior of your SD card. If the card introduces too many long delays during writes while it flushes its internal cache, the format may fail. In most cases you can just try again and it’ll work. If the format keeps failing, try a different SD card.

Many disk duplication tools do a simultaneous format and write of the target disk. Some will also do a bit-level copy, attempting to duplicate hidden details of copy-protected source disks. Cracking tools like Copy II+ may not work correctly when the target disk is a Floppy Emu disk image. Use DOS 3.3’s COPYA or ProDOS’s Duplicate Disk utility, both of which work fine.

Formatting is also dependent on the type of disk image used, and potentially also on the disk image’s metadata. In short: if you need to format a disk, use this Blank.woz disk image and you’ll be fine. Only NIB and WOZ disk image types support true formatting. All other types will retain standard DOS/ProDOS sector layouts and volume number 254 regardless of how you attempt to format them. NIB disk images have tracks that are about 4% bigger than normal, which is sort of like a disk drive that spins 4% too slowly. DOS 3.3 still formats a NIB just fine, as will games using normal RWTS routines, but ProDOS will complain the drive is too slow. Formatting WOZ disk images is more reliable, but it must be a WOZ image of a normal disk with normal track lengths and track layouts. If you just grab some copy-protected WOZ disk image and try to format it, it may fail. The provided Blank.woz should format nicely using most Apple II programs.

Dual 5.25 inch drive emulation is here!

Now the Emu hardware can emulate two 5.25 inch drives at once, which is great for two-disk games and for reducing disk swapping. This feature is available on the Floppy Emu Model C, so you may wish to consider an upgrade if you have an older model and frequently use 5.25 inch floppy emulation. Dual 5.25 emulation mode is compatible with any Apple II computer or 5.25 inch disk controller with a 19-pin D-SUB (DB-19) connector, except the Apple IIc. The IIc and the rectangular 10×2 pin disk connector both lack the necessary disk I/O signal for controlling a second drive. Both could theoretically be supported in the future with some kind of Y-cable adapter that plugs in to two separate disk connectors. Coming soon, maybe?

Don’t use Dual 5.25 mode in combination with the optional BMOW Daisy Chainer or A/B Switch. It will cause disk errors and could damage the Floppy Emu or your daisy-chained 5.25 inch drive.

Dual 5.25 mode emulates two daisy-chained 5.25 inch drives on a single Floppy Emu board. When using this mode, care must be taken to avoid accidentally creating a forked daisy chain with two branches. This could cause the Floppy Emu and another daisy-chained or A/B-connected 5.25 inch drive to fight with each other, possibly damaging them both. To avoid this, select single 5.25 inch emulation mode when using the Daisy Chainer or A/B switch.

……One More Thing

Effective immediately I’m cutting the price of the Floppy Emu Model C by $10 and the Floppy Emu Deluxe Bundle by $20. I’ve been testing these new prices for a little while now as a temporary sale, and based on the response I’ve decided to make the new prices permanent. This puts the Model C at $99 and the Deluxe Bundle at $119. I hope this helps to get hardware into a few more hands of vintage Apple enthusiasts.

Thanks for everybody’s support and enthusiasm over the years that I’ve been doing this. Sometimes the road is bumpy, but it’s always a pleasure hearing from people about the ways they’ve put their BMOW hardware to use. I’m happy to have contributed something to this hobby and its amazing community.

Read 34 comments and join the conversation 

FPGA Block RAM Packing

In an earlier blog post, I was lamenting how one-ninth of an FPGA block RAM was wasted when storing 8-bit ROM data, because there’s no simple way to make use of the 9th parity bit in each word of a block RAM. Horrors! To fight this injustice, I’ve developed a solution that I call packed ROM. It stores nine 8-bit bytes in eight 9-bit words of block RAM, and provides an interface to read the data as if it were an 8-bit memory with a larger depth. Using this method, I’m able to store 1152 bytes of read-only data per block RAM instead of only 1024. The solution relies on the fact that the block RAMs are dual port – you can read from two different addresses simultaneously. Compared with using the same number of block RAMs as a standard 8-bit wide ROM, this solution consumes an extra 54 LUT4s in a MachXO2-1200 FPGA – about 4 percent of the total. It increases the MachXO2-1200’s effective capacity for this type of 8-bit ROM data from 7168 to 8064 bytes.

Here’s the Verilog code, as well as a Python program that reads a plain binary file and writes a “packed” file in .mem format. The code assumes 7 block RAMs, but should be easily adaptable to other numbers.

module packedROM #(parameter NUM_BLOCK_RAMS = 7) (
    input [12:0] addr,
	input clk,
	output reg [7:0] Q
	// packs 1152*NUM_BLOCK_RAMS 8-bit data bytes into 1024*NUM_BLOCK_RAMS 9-bit words 
	// uses 54 LUT4s of the MachXO2
	// may need to change addr width depending on NUM_BLOCK_RAMS. Use $clog2()? 
	// nine bytes A-I are packed into eight 9-bit words as follows:
	// 0: I3 I2 I1 I0 A4 A3 A2 A1 A0
	// 1: I7 I6 I5 I4 B4 B3 B2 B1 B0
	// 2: F7 F6 E7 E6 C4 C3 C2 C1 C0
	// 3: H7 H6 G7 G6 D4 D3 D2 D1 D0
	// 4: A7 A6 A5 E5 E4 E3 E2 E1 E0
	// 5: B7 B6 B5 F5 F4 F3 F2 F1 F0
	// 6: C7 C6 C5 G5 G4 G3 G2 G1 G0
	// 7: D7 D6 D5 H5 H4 H3 H2 H1 H0
	// bytes A-H are sequental in the byte-oriented address space below addr 1024*NUM_BLOCK_RAMS
	// byte I is one of the "extra" bytes, in byte-oriented address space beyond addr 1024*NUM_BLOCK_RAMS
	reg [12:0] wordAddressA;
	reg [12:0] wordAddressB;
	wire [8:0] QA;
	wire [8:0] QB; 
	// dualPortROM is a wrapper for the MachXO2 block RAMs, created by the Lattice IP Express tool.
	// it is actually a dual port RAM with the write input unused
	dualPortROM myDualPortROM(
	wire [12:0] overflowAddr = addr - (NUM_BLOCK_RAMS * 1024);
	always @* begin
		if (addr < NUM_BLOCK_RAMS * 1024) begin
			// packed area, bytes A-H
			wordAddressA <= addr;
			// word address for the upper bits depends on low three bits of the byte address
			case (addr[2:0])
				0: begin // A
					wordAddressB <= { addr[12:3], 3'b100 };
					Q <= { QB[8:6], QA[4:0] };	
				1: begin // B
					wordAddressB <= { addr[12:3], 3'b101 };
					Q <= { QB[8:6], QA[4:0] };	
				2: begin // C
					wordAddressB <= { addr[12:3], 3'b110 };
					Q <= { QB[8:6], QA[4:0] };	
				3: begin // D
					wordAddressB <= { addr[12:3], 3'b111 };
					Q <= { QB[8:6], QA[4:0] };	
				4: begin // E
					wordAddressB <= { addr[12:3], 3'b010 };
					Q <= { QB[6:5], QA[5:0] };	
				5: begin // F
					wordAddressB <= { addr[12:3], 3'b010 };
					Q <= { QB[8:7], QA[5:0] };	
				6: begin // G
					wordAddressB <= { addr[12:3], 3'b011 };
					Q <= { QB[6:5], QA[5:0] };	
				7: begin // H
					wordAddressB <= { addr[12:3], 3'b011 };
					Q <= { QB[8:7], QA[5:0] };	
		else begin
			// overflow area, byte I
			// word address is byte overflow address times 8 for the lower bits, and times 8 plus 1 for the upper bits
			wordAddressA <= { overflowAddr[9:0], 3'b000 };
			wordAddressB <= { overflowAddr[9:0], 3'b001 };
			Q <= { QB[8:5], QA[8:5] };

import os
from array import array

infile = "coderom.bin"
outfile = "coderom.mem"

inputData = array('B')

insize = os.path.getsize(infile)
with open(infile, 'rb') as f:
    inputData.fromfile(f, insize)
    out = open(outfile,"w") 
    num_block_rams = 7
    outsize = 1024 * num_block_rams
    for x in range(0,outsize):
        baseAddr = x & ~7
        if x & 7 == 0:
            out.write('{:02X}\n'.format( (((inputData[outsize+baseAddr//8])&0xF)<<5) | ((inputData[baseAddr])&0x1F)))
        elif x & 7 == 1:
            out.write('{:02X}\n'.format( (((inputData[outsize+baseAddr//8])&0xF0)<<1) | ((inputData[baseAddr+1])&0x1F)))
        elif x & 7 == 2:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+5])&0xC0)<<1) | (((inputData[baseAddr+4])&0xC0)>>1) | ((inputData[baseAddr+2])&0x1F)))
        elif x & 7 == 3:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+7])&0xC0)<<1) | (((inputData[baseAddr+6])&0xC0)>>1) | ((inputData[baseAddr+3])&0x1F)))
        elif x & 7 == 4:
            out.write('{:02X}\n'.format( (((inputData[baseAddr])&0xE0)<<1) | ((inputData[baseAddr+4])&0x3F)))
        elif x & 7 == 5:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+1])&0xE0)<<1) | ((inputData[baseAddr+5])&0x3F)))
        elif x & 7 == 6:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+2])&0xE0)<<1) | ((inputData[baseAddr+6])&0x3F)))
        elif x & 7 == 7:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+3])&0xE0)<<1) | ((inputData[baseAddr+7])&0x3F)))

Read 4 comments and join the conversation 

When 64 Kbits Is Not 8 Kbytes

This FPGA-based disk controller project is going to need every byte of on-chip memory that I can scrounge up. The datasheet says my Lattice MachXO2-1200 has 64 Kbits of embedded RAM (EBR). See the shaded column in the table above. 64 Kbits is 8 Kbytes, and I plan to store 8 KB of 6502 program code, so that looks perfect. Except that I misinterpreted the table in two different ways.

Looking more closely at the table, there are 7 EBR blocks and each block is 9 Kbits. That’s a total of 63 Kbits, not 64. The datasheet is just wrong here, or they’re using some very liberal rounding method. I just lost 1 Kbit!

That’s not the worst of it. Later in the datasheet, it mentions that each EBR block can be configured as 8192 x 1, 4096 x 2, 2048 x 4, or 1024 x 9. Only the last of those configurations represents 9 Kbits of data. If you want to store 8-bit wide data, you have to use the 1024 x 9 configuration and throw away the extra bit. So that’s another 1 Kbit lost from each bank.

When you add everything up, if you’re storing 8-bit wide data, the MachXO2-1200 can only store 56 Kbits in EBR rather than the advertised 64 or 63 Kbits. That may sound like a small difference, but it will have a big impact on my design.

Sure I could add an external SRAM, and maybe I’ll do that eventually, but I really want to squeeze all the advertised memory space from this chip. The wasted area offends my engineering sensibilities. So I’ve been brainstorming a couple of crazy solutions, and wondering if anyone else has ever tried something similar.

I could pack the 8-bit byte data like a bitstream into consecutive 9-bit words of an EBR block, so nine bytes A through I would be stored in eight words:

EBR0[0]: B0 A7 A6 A5 A4 A3 A2 A1 A0
EBR0[1]: C1 C0 B7 B6 B5 B4 B3 B2 B1
EBR0[2]: D2 D1 D0 C7 C6 C5 C4 C3 C2
EBR0[3]: E3 E2 E1 E0 D7 D6 D5 D4 D3
EBR0[4]: F4 F3 F2 F1 F0 E7 E6 E5 E4
EBR0[5]: G5 G4 G3 G2 G1 G0 F7 F6 F5
EBR0[6]: H6 H5 H4 H3 H2 H1 H0 G7 G6
EBR0[7]: I7 I6 I5 I4 I3 I2 I1 I0 H7

To read a byte would require reading two separate words, and then doing some bit shifting and masking with the results. I’d need some kind of state machine and an appropriate clock to handle the two separate reads. And I’d need some kind of divide by nine logic (or eight-ninths?) to convert a byte-oriented address to the corresponding 9-bit word address.

A second idea is to leverage the fact that these are seven separate EBR blocks, and to read from them all in parallel, combining one or two bits from each to reassemble the byte:

EBR0[0]: I0 H0 G0 F0 E0 D0 C0 B0 A0
EBR1[0]: I1 H1 G1 F1 E1 D1 C1 B1 A1
EBR2[0]: I2 H2 G2 F2 E2 D2 C2 B2 A2
EBR3[0]: I3 H3 G3 F3 E3 D3 C3 B3 A3
EBR4[0]: I4 H4 G4 F4 E4 D4 C4 B4 A4
EBR5[0]: I5 H5 G5 F5 E5 D5 C5 B5 A5
EBR6[0]: E6 D7 D6 C7 C6 B7 B6 A7 A6 

The intended advantage here was that I would only need to do one read from each EBR block, but since there are one fewer EBR blocks than bits in a byte, one of the blocks must perform double-duty and this advantage is lost. And I would still need some kind of divide by nine address logic, with different logic for some EBRs than others. I’m actually not sure whether this approach would even work.

I feel like there should be some not-too-complex scheme to store the full 63 Kbits of data in a way allowing for 8-bit byte retrieval, but I can’t quite find it.

Read 9 comments and join the conversation 

Older Posts »