BMOW title
Floppy Emu banner

Archive for the 'Yellowstone' Category

FPGA Block RAM Packing

In an earlier blog post, I was lamenting how one-ninth of an FPGA block RAM was wasted when storing 8-bit ROM data, because there’s no simple way to make use of the 9th parity bit in each word of a block RAM. Horrors! To fight this injustice, I’ve developed a solution that I call packed ROM. It stores nine 8-bit bytes in eight 9-bit words of block RAM, and provides an interface to read the data as if it were an 8-bit memory with a larger depth. Using this method, I’m able to store 1152 bytes of read-only data per block RAM instead of only 1024. The solution relies on the fact that the block RAMs are dual port – you can read from two different addresses simultaneously. Compared with using the same number of block RAMs as a standard 8-bit wide ROM, this solution consumes an extra 54 LUT4s in a MachXO2-1200 FPGA – about 4 percent of the total. It increases the MachXO2-1200’s effective capacity for this type of 8-bit ROM data from 7168 to 8064 bytes.

Here’s the Verilog code, as well as a Python program that reads a plain binary file and writes a “packed” file in .mem format. The code assumes 7 block RAMs, but should be easily adaptable to other numbers.

module packedROM #(parameter NUM_BLOCK_RAMS = 7) (
    input [12:0] addr,
	input clk,
	output reg [7:0] Q
    );
	
	// packs 1152*NUM_BLOCK_RAMS 8-bit data bytes into 1024*NUM_BLOCK_RAMS 9-bit words 
	// uses 54 LUT4s of the MachXO2
	// may need to change addr width depending on NUM_BLOCK_RAMS. Use $clog2()? 
	
	// nine bytes A-I are packed into eight 9-bit words as follows:
	// 0: I3 I2 I1 I0 A4 A3 A2 A1 A0
	// 1: I7 I6 I5 I4 B4 B3 B2 B1 B0
	// 2: F7 F6 E7 E6 C4 C3 C2 C1 C0
	// 3: H7 H6 G7 G6 D4 D3 D2 D1 D0
	// 4: A7 A6 A5 E5 E4 E3 E2 E1 E0
	// 5: B7 B6 B5 F5 F4 F3 F2 F1 F0
	// 6: C7 C6 C5 G5 G4 G3 G2 G1 G0
	// 7: D7 D6 D5 H5 H4 H3 H2 H1 H0
	// bytes A-H are sequental in the byte-oriented address space below addr 1024*NUM_BLOCK_RAMS
	// byte I is one of the "extra" bytes, in byte-oriented address space beyond addr 1024*NUM_BLOCK_RAMS
	
	reg [12:0] wordAddressA;
	reg [12:0] wordAddressB;
	
	wire [8:0] QA;
	wire [8:0] QB; 
	
	// dualPortROM is a wrapper for the MachXO2 block RAMs, created by the Lattice IP Express tool.
	// it is actually a dual port RAM with the write input unused
	dualPortROM myDualPortROM(
		.DataInA(9'b000000000),
		.DataInB(9'b000000000),
		.AddressA(wordAddressA),
		.AddressB(wordAddressB),
		.ClockA(clk),
		.ClockB(clk),
		.ClockEnA(1'b1),
		.ClockEnB(1'b1),
		.WrA(1'b0),
		.WrB(1'b0),
		.ResetA(1'b0),
		.ResetB(1'b0), 
		.QA(QA),
		.QB(QB)
	);
	
	wire [12:0] overflowAddr = addr - (NUM_BLOCK_RAMS * 1024);
			
	always @* begin
		if (addr < NUM_BLOCK_RAMS * 1024) begin
			// packed area, bytes A-H
			wordAddressA <= addr;
			// word address for the upper bits depends on low three bits of the byte address
			case (addr[2:0])
				0: begin // A
					wordAddressB <= { addr[12:3], 3'b100 };
					Q <= { QB[8:6], QA[4:0] };	
					end
				1: begin // B
					wordAddressB <= { addr[12:3], 3'b101 };
					Q <= { QB[8:6], QA[4:0] };	
					end
				2: begin // C
					wordAddressB <= { addr[12:3], 3'b110 };
					Q <= { QB[8:6], QA[4:0] };	
					end
				3: begin // D
					wordAddressB <= { addr[12:3], 3'b111 };
					Q <= { QB[8:6], QA[4:0] };	
					end	
				4: begin // E
					wordAddressB <= { addr[12:3], 3'b010 };
					Q <= { QB[6:5], QA[5:0] };	
					end		
				5: begin // F
					wordAddressB <= { addr[12:3], 3'b010 };
					Q <= { QB[8:7], QA[5:0] };	
					end											
				6: begin // G
					wordAddressB <= { addr[12:3], 3'b011 };
					Q <= { QB[6:5], QA[5:0] };	
					end						
				7: begin // H
					wordAddressB <= { addr[12:3], 3'b011 };
					Q <= { QB[8:7], QA[5:0] };	
					end
			endcase
		end
		else begin
			// overflow area, byte I
			// word address is byte overflow address times 8 for the lower bits, and times 8 plus 1 for the upper bits
			wordAddressA <= { overflowAddr[9:0], 3'b000 };
			wordAddressB <= { overflowAddr[9:0], 3'b001 };
			Q <= { QB[8:5], QA[8:5] };
		end
	end
endmodule




import os
from array import array

infile = "coderom.bin"
outfile = "coderom.mem"

inputData = array('B')

insize = os.path.getsize(infile)
with open(infile, 'rb') as f:
    inputData.fromfile(f, insize)
    out = open(outfile,"w") 
    
    num_block_rams = 7
    outsize = 1024 * num_block_rams
   
    for x in range(0,outsize):
        baseAddr = x & ~7
        if x & 7 == 0:
            out.write('{:02X}\n'.format( (((inputData[outsize+baseAddr//8])&0xF)<<5) | ((inputData[baseAddr])&0x1F)))
        elif x & 7 == 1:
            out.write('{:02X}\n'.format( (((inputData[outsize+baseAddr//8])&0xF0)<<1) | ((inputData[baseAddr+1])&0x1F)))
        elif x & 7 == 2:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+5])&0xC0)<<1) | (((inputData[baseAddr+4])&0xC0)>>1) | ((inputData[baseAddr+2])&0x1F)))
        elif x & 7 == 3:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+7])&0xC0)<<1) | (((inputData[baseAddr+6])&0xC0)>>1) | ((inputData[baseAddr+3])&0x1F)))
        elif x & 7 == 4:
            out.write('{:02X}\n'.format( (((inputData[baseAddr])&0xE0)<<1) | ((inputData[baseAddr+4])&0x3F)))
        elif x & 7 == 5:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+1])&0xE0)<<1) | ((inputData[baseAddr+5])&0x3F)))
        elif x & 7 == 6:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+2])&0xE0)<<1) | ((inputData[baseAddr+6])&0x3F)))
        elif x & 7 == 7:
            out.write('{:02X}\n'.format( (((inputData[baseAddr+3])&0xE0)<<1) | ((inputData[baseAddr+7])&0x3F)))

Be the first to comment! 

When 64 Kbits Is Not 8 Kbytes

This FPGA-based disk controller project is going to need every byte of on-chip memory that I can scrounge up. The datasheet says my Lattice MachXO2-1200 has 64 Kbits of embedded RAM (EBR). See the shaded column in the table above. 64 Kbits is 8 Kbytes, and I plan to store 8 KB of 6502 program code, so that looks perfect. Except that I misinterpreted the table in two different ways.

Looking more closely at the table, there are 7 EBR blocks and each block is 9 Kbits. That’s a total of 63 Kbits, not 64. The datasheet is just wrong here, or they’re using some very liberal rounding method. I just lost 1 Kbit!

That’s not the worst of it. Later in the datasheet, it mentions that each EBR block can be configured as 8192 x 1, 4096 x 2, 2048 x 4, or 1024 x 9. Only the last of those configurations represents 9 Kbits of data. If you want to store 8-bit wide data, you have to use the 1024 x 9 configuration and throw away the extra bit. So that’s another 1 Kbit lost from each bank.

When you add everything up, if you’re storing 8-bit wide data, the MachXO2-1200 can only store 56 Kbits in EBR rather than the advertised 64 or 63 Kbits. That may sound like a small difference, but it will have a big impact on my design.

Sure I could add an external SRAM, and maybe I’ll do that eventually, but I really want to squeeze all the advertised memory space from this chip. The wasted area offends my engineering sensibilities. So I’ve been brainstorming a couple of crazy solutions, and wondering if anyone else has ever tried something similar.

I could pack the 8-bit byte data like a bitstream into consecutive 9-bit words of an EBR block, so nine bytes A through I would be stored in eight words:

EBR0[0]: B0 A7 A6 A5 A4 A3 A2 A1 A0
EBR0[1]: C1 C0 B7 B6 B5 B4 B3 B2 B1
EBR0[2]: D2 D1 D0 C7 C6 C5 C4 C3 C2
EBR0[3]: E3 E2 E1 E0 D7 D6 D5 D4 D3
EBR0[4]: F4 F3 F2 F1 F0 E7 E6 E5 E4
EBR0[5]: G5 G4 G3 G2 G1 G0 F7 F6 F5
EBR0[6]: H6 H5 H4 H3 H2 H1 H0 G7 G6
EBR0[7]: I7 I6 I5 I4 I3 I2 I1 I0 H7

To read a byte would require reading two separate words, and then doing some bit shifting and masking with the results. I’d need some kind of state machine and an appropriate clock to handle the two separate reads. And I’d need some kind of divide by nine logic (or eight-ninths?) to convert a byte-oriented address to the corresponding 9-bit word address.

A second idea is to leverage the fact that these are seven separate EBR blocks, and to read from them all in parallel, combining one or two bits from each to reassemble the byte:

EBR0[0]: I0 H0 G0 F0 E0 D0 C0 B0 A0
EBR1[0]: I1 H1 G1 F1 E1 D1 C1 B1 A1
EBR2[0]: I2 H2 G2 F2 E2 D2 C2 B2 A2
EBR3[0]: I3 H3 G3 F3 E3 D3 C3 B3 A3
EBR4[0]: I4 H4 G4 F4 E4 D4 C4 B4 A4
EBR5[0]: I5 H5 G5 F5 E5 D5 C5 B5 A5
EBR6[0]: E6 D7 D6 C7 C6 B7 B6 A7 A6 

The intended advantage here was that I would only need to do one read from each EBR block, but since there are one fewer EBR blocks than bits in a byte, one of the blocks must perform double-duty and this advantage is lost. And I would still need some kind of divide by nine address logic, with different logic for some EBRs than others. I’m actually not sure whether this approach would even work.

I feel like there should be some not-too-complex scheme to store the full 63 Kbits of data in a way allowing for 8-bit byte retrieval, but I can’t quite find it.

Read 9 comments and join the conversation 

Squeezing FPGA Memory

I’m developing an Apple II disk controller that’s based on the UDC disk controller design. The original UDC card had 8K of ROM and 2K of RAM, so it needs 10K of combined memory. The FPGA device I’m using for prototyping, a Lattice MachXO2-1200, has 8K of embedded block RAM and 1.25K of distributed RAM. It also has 8K of “user flash memory”. So will the UDC design fit? It’s close, but I think the answer is no.

At first I thought I could store the ROM data in the FPGA’s UFM section, but that doesn’t look promising. I can store the data there, but compared to embedded block RAM, accessing UFM is inconvenient and probably impractical for live execution of 6502 code. Accessing the UFM requires setting up a Wishbone interface in the FPGA’s Verilog code, starting a memory transaction, and reading out an entire page of flash (16 bytes). It’s also pretty slow. I don’t think it’ll be possible to read an arbitrary byte of UFM and return it to the CPU within ~500 ns, as would be required for directly executing code from it.

OK, so no UFM. Maybe I can store the 8K of ROM data in EBR, using RAM to hold what’s technically ROM? That would work, but it would leave only 1.25K of distributed FPGA RAM remaining to implement the required 2K of RAM for the disk controller. It’s 768 bytes short. No good.

I could switch to a larger FPGA with more memory, or add a separate RAM or ROM chip. But that would increase cost and complexity, and anyway wouldn’t help with my prototype board that’s already built.

 
Stupid Idea #1

From my analysis of the UDC ROM, I think the upper half of the card’s RAM is only used when communicating with Smartport drives. So I might be able to reduce the RAM from 2K to 1K, and at least I’d be able to test whether 3.5 inch and 5.25 inch drive support works. Using 8K of EBR and 1K of distributed RAM, I’d have a whopping 256 bytes of RAM left. Will it work? I think distributed RAM just means using the FPGA’s logic resources as RAM, so this approach would use 80% of the FPGA’s logic resources and only leave 20% remaining for the actual card functionality, like the IWM model and other logic. It might work, it might not.

 
Stupid Idea #2

The 8K of ROM isn’t one large chunk. It’s divided into 1K banks that can be mapped into a single 1K region of the computer’s address space. There’s already a small code routine to facilitate the bank switching. What if I could somehow make this routine copy the desired 1K block from UFM to EBR at the moment it’s needed? Then I’d have 8K in UFM, with a 1K cache in EBR, and the 2K of RAM also in EBR.

This would definitely fit, but there would be a delay every time code execution moved to a different 1K ROM page. How long does it take to move 1024 bytes from UFM to EBR? I’m not sure, but I’ll guess it’s tens to hundreds of microseconds. Will that cause problems? Maybe. Will this approach be a pain to implement? Definitely.

 
Stupid Idea #3

From what I’ve observed of the ROM code, bank 1 contains 5.25 inch functions plus 3.5 inch formatting. Bank 3 is exclusively for 3.5 inch stuff, and bank 7 is exclusively for Smartport drives. Maybe I could temporarily remove some parts of the ROM, in order to make it all fit? Then I might be able to test all the different types of supported disk drives, just not all at the same time.

 
Stupid Idea #4

Maybe I can modify the ROM code to use 2K of the Apple II’s own RAM instead of 2K of onboard RAM? Then everything would fit in the FPGA. But there must be a good reason the UDC designers didn’t do this. What 2K region of Apple II RAM is safe to use, and wouldn’t get overwritten by running software? I’m not sure.

 
Stupid Idea #5

Maybe I can modify the prototype board somehow, and graft an extra RAM or ROM chip on there for testing purposes? Maybe I can add a second peripheral card and somehow use its RAM or ROM? Now these ideas are getting crazy.

 
What’s the Long-Term Solution?

None of these ideas except #2 are workable as a long-term solution, if I eventually move ahead with manufacturing this disk controller card. So what path makes the most sense in the long-term?

Stepping up to the MachXO2-2000 would add about $2 in parts cost, which maybe doesn’t sound like much, but it’s significant. The XO2-2000 has 9.25K EB RAM and 2K distributed RAM, so the design should fit with a small amount of room to spare. That’s surely the least-effort solution.

I could keep the MachXO2-1200 and add a separate 2K RAM chip. The 8K of ROM would fit in the 1200’s EBR. The combined cost might be slightly lower than the MachXO2-2000, but the design and layout would become more complex, and I’m not sure it’s worth it.

I could step down to the MachXO2-640 (2.25K EBR, 640 bytes distributed RAM) and add a separate ROM chip. Total cost would be slightly less than a MachXO2-1200, and I’d also gain lots of extra ROM space for implementing extra features or modes. That would be great. Like adding a separate RAM chip, the extra ROM would make the board design and layout somewhat more complex. But the biggest drawback would be for manufacturing or reprogramming, because both the FPGA and the ROM would need to be programmed separately before the card could be used. Or maybe the FPGA could program the ROM somehow, but it would still be cumbersome and far less attractive than a single-chip programming process.

I never imagined a shortage of just 768 bytes could make such a difference. What an adventure!

Read 13 comments and join the conversation 

Unlocking the Secrets of the UDC Disk Controller

I concluded my last post about my Yellowstone disk controller for Apple II by saying I would probably support intelligent Smartport drives and 5.25 inch drives, but not 3.5 inch drives. But since then I’ve been busy investigating the possibility of a multi-purpose disk controller that could support all three drive types, by studying the Universal Disk Controller (UDC) that was marketed by Laser, VTech, and CPS. I’ve made some good progress, but big mysteries still remain.

There were two physical versions of the UDC: the original “long” version and the later “short” version. There were also at least four different firmware versions, and probably many more. The UDC’s capabilities may have changed substantially between versions. I have ROM dumps for firmware versions $21, $23, $30, and $40. Initially I thought these hex values for version number should be converted to decimal for interpretation, so for example $21 meant human-readable version 3.3. But it now seems more likely we’re intended to just put a decimal point between the two hex nibbles, and so I have firmware versions 2.1, 2.3, 3.0, and 4.0. The 2.x versions are from long UDCs and the 3.0 and 4.0 are from short UDCs.

 
ROM Spelunking

Unfortunately I don’t have any UDC cards! All of my investigation is based on analysis of these ROM dumps and contemporary documentation.

Making sense of the ROM is a slow and tedious task. It’s quite possibly the most time invested for the least reward of anything I’ve ever attempted. Running the ROM dump through a 6502 disassembler produces thousands of lines of output like this:

CF9B   AD CF CA   LDA $CACF
CF9E   C5 41      CMP $41
CFA0   F0 37      BEQ $CFD9
CFA2   90 42      BCC $CFE6
CFA4   E5 41      SBC $41
CFA6   8D DD CB   STA $CBDD
CFA9   A9 0B      LDA #$0B
CFAB   20 F8 C9   JSR $C9F8
CFAE   A9 0E      LDA #$0E
CFB0   20 F8 C9   JSR $C9F8
CFB3   BD 8D C0   LDA $C08D,X
CFB6   BD 81 C0   LDA $C081,X
CFB9   BD 8E C0   LDA $C08E,X
CFBC   10 F5      BPL $CFB3
CFBE   CE DD CB   DEC $CBDD
CFC1   D0 EB      BNE $CFAE
CFC3   AD C6 CA   LDA $CAC6
CFC6   29 20      AND #$20
CFC8   D0 0A      BNE $CFD4
CFCA   A9 01      LDA #$01
CFCC   20 02 CA   JSR $CA02
CFCF   BD 8E C0   LDA $C08E,X
CFD2   30 FB      BMI $CFCF
CFD4   A0 00      LDY #$00
CFD6   88         DEY
CFD7   D0 FD      BNE $CFD6
CFD9   A5 41      LDA $41
CFDB   8D CF CA   STA $CACF
CFDE   AC D4 CB   LDY $CBD4
CFE1   99 D2 CB   STA $CBD2,Y
CFE4   18         CLC
CFE5   60         RTS
CFE6   A5 41      LDA $41
CFE8   38         SEC
CFE9   ED CF CA   SBC $CACF
CFEC   8D DD CB   STA $CBDD
CFEF   A9 0F      LDA #$0F
CFF1   D0 B8      BNE $CFAB

It looks indecipherable. But I can replace a few known memory addresses with symbols, such as all the references to addresses in the $C08x range, which are IWM latch addresses on the Apple II. I can examine the subroutines that are called from JSR instructions, infer what some of the simplest ones are doing, and replace their addresses with symbolic names. The referenced memory locations like $41 and $CACF are clearly state variables of some kind. I can look for all the places where those locations are used or modified, and begin to guess what they’re used for, and give them symbolic names too.

Eventually some parts of the code will become more readable. This helps me to make better inferences about other code that references the newly-readable parts. This process repeats in a sort of recursive fashion, until after many many hours and thousands of lines of opaque code analysis, the code above is transformed into something like this:

; ROMFUNC 81 - Seek
; seek to track for 3.5 drive
L4CF9B   LDA CUR_TRK          ; current track 
         CMP WANT_TRK         ; desired track 
         BEQ L4CFD9
         BCC L4CFE6
         SBC WANT_TRK         ; subtract to get the number of tracks to step
L4CFA6   STA $CBDD            ; init the step counter
         LDA #$0B             ; write drive register 0100: step direction towards track 0
L4CFAB   JSR WRREG35
L4CFAE   LDA #$0E             ; write drive register 0001: perform step 
         JSR WRREG35
L4CFB3   LDA SENSE_ON,X
         LDA PH0_ON,X         ; redundant? PH0 was already on
         LDA WRITE_OFF,X      ; check if step is completed
         BPL L4CFB3           ; keep waiting if the step isn't yet completed
         DEC $CBDD            ; decrement the step counter
         BNE L4CFAE           ; loop back if there are more steps yet to be done
         LDA CUR_DRV_FLGS
         AND #$20             ; mask bit 5 of the drive flags (changed speed zones)
         BNE L4CFD4           ; skip ahead if flag is 1
         LDA #$01             ; read drive register 1110: READY flag
         JSR RDREG35	
L4CFCF   LDA WRITE_OFF,X
         BMI L4CFCF           ; keep waiting if the drive is not ready
L4CFD4   LDY #$00             ; busy loop delay
L4CFD6   DEY                  ; 256
         BNE L4CFD6           ; times
L4CFD9   LDA WANT_TRK
         STA CUR_TRK          ; update the current track number
         LDY DRV_NUM          ; maybe drive number?
         STA CUR_TRK_TAB,Y    ; set the current track number in the drive table
         CLC                  ; carry value 0 means OK/success
         RTS
L4CFE6   LDA WANT_TRK
         SEC
         SBC CUR_TRK          ; reverse subtract to get the number of tracks to step
         STA $CBDD            ; init the step counter
         LDA #$0F             ; write drive register 0000: set step direction towards track 79
         BNE L4CFAB

 
Intelligent Smartport Drive Support – The Phantom Feature

Firmware versions 2.1 and 2.3 from the long UDC are nearly identical. They contain support for intelligent Smartport drives, like Floppy Emu’s Smartport hard disk emulation mode, or the Unidisk 3.5 drive. I’ve looked at the Smartport support code in detail, and it seems correct and complete. And yet… there are many sources on the web saying the UDC doesn’t support intelligent Smartport drives, and connecting one will damage the drive or the card. Hmm.

The 2.x firmware versions don’t seem to have support for daisy-chained drives. That’s a big disappointment, since I would definitely like Yellowstone to support daisy-chaining if possible.

There are many things in this firmware that look “not quite right”. I see unnecessarily convoluted code, limitations, questionable assumptions, and possible bugs. It could be that I just don’t understand the code fully enough, but it really looks like it was written by somebody who didn’t totally understand what they were doing, or was in a rush, or was just not a very good programmer.

As best as I can tell, the long UDC corresponds to this version of the instruction manual, which says it supports up to two drives on two separate connectors. It’s sort of vague about intelligent Smartport or Unidisk 3.5 support.

Being somewhat disappointed in the 2.x firmware and its lack of daisy-chaining support, I began to analyze firmware 4.0. But after only a few hours I realized something terrible: it has no support for intelligent Smartport drives! And neither does firmware 3.0. The Smartport support that was there in firmware 2.x is gone.

Why, WHY would they remove previously-existing support for Smartport drives? It doesn’t make sense.

The short UDC appears to correspond to this version of the instruction manual, which says it supports daisy-chaining and up to four drives. Again, it’s sort of vague about intelligent Smartport support.

According to sources I’ve read, later models of the Laser 128 computer contain an integrated UDC as a floppy drive controller, and these computers do support intelligent Smartport drives. This Australian web page has some helpful info if you search for “UDC”, about halfway down the page. There’s also the Laser 128 manual, where the chapter on disk I/O has a detailed discussion about the different types of drives and specifically lists Unidisk 3.5 as one of the supported drive types for the Laser 128. Which UDC version is that?

 
Still Searching for Answers

So that’s where I am. I’d like to make a disk controller card that handles all three Apple II drive types, and daisy-chaining, similar to the Apple IIGS. UDC firmware 2.x seems to support Smartport drives (although there’s some question about this), but doesn’t support daisy-chaining, and overall looks a bit rough. UDC firmware 3.0 and 4.0 supports daisy-chaining, but support for Smartport drives was removed. And the Laser 128 contains an integrated UDC that reportedly supports Smartport drives, but its daisy-chaining capabilities are uncertain. Clear as mud. Where do I go next?

Maybe firmware 3.0 and 4.0 do contain support for Smartport drives, but it’s so cleverly obfuscated that I missed it? That seems very unlikely. Maybe I could take the code for Smartport drives from firmware 2.3 and somehow add it to firmware 4.0 to create a version that does everything? That sounds extremely difficult – even after my marathon code analysis, I don’t understand the details well enough to attempt something like that. Maybe I need a ROM dump from a Laser 128? Or maybe I should forget about the UDC altogether, and take up a new hobby? 🙂

Read 11 comments and join the conversation 

FPGA Disk Controller Next Steps

After more than two years of sporadic effort, my Yellowstone FPGA-based disk controller card for Apple II is finally working. That means the fundamental disk control capabilities are there, but there’s still a great deal of work left to do. Now I’m at a crossroads, and must decide what else makes sense to add, and what I’m genuinely interested to pursue. So what’s next?

In its current state, the card can function in one of two modes. Mode one is a work-alike Apple Liron disk controller, which is compatible with intelligent disk drives like the Unidisk 3.5 and the BMOW Floppy Emu‘s Smartport hard disk emulation. The best use of Liron mode is probably adding 32 MB hard disks to an Apple II+ or IIe with a Floppy Emu.

Mode two has the functionality of the standard Disk II controller for 5.25 inch floppy drives. That’s maybe less exciting since virtually everyone already has one of these, but there are plenty of uses for a second 5.25 inch disk controller. eBay’s supply of original Disk II controllers is shrinking, and prices are climbing, so it’s helpful to have an alternative. There’s still some work remaining to finish Yellowstone’s support for 5.25 inch floppy writing (reading is finished), but I don’t anticipate any major difficulties there.

 
Electronics and Mechanics

My first task for version 2 is to address a lengthy list of board changes. Most of these won’t change the card’s behavior, but they’ll help it to work more reliably and safely, and provide for future improvements. These changes include things like adding termination resistors and bus drivers to isolate the FPGA, a bigger voltage regulator, test points for all the important signals, more capacitors in different places, improved power/ground routing, and adding a second disk connector.

Some helpful Apple II bus signals aren’t connected correctly, or aren’t connected at all, so I’ll need to fix that. A hardware solution for self-programming needs to be designed and added too. There’s lots of work to do in this category, and it could keep me busy for weeks. That’s frustrating when all I want to do is develop new features, but taking care of the card’s electronic fundamentals is important.

 
Drive Type Auto-detection

The two modes are selectable with a jumper on the card. It’s either a Liron or a Disk II controller. It would be nice to merge these somehow, and auto-detect the type of attached drives. A basic solution would auto-configure the card into one of the two modes. A more complex solution would create a hybrid mode that could support Smartport drives and 5.25 inch drives at the same time on different disk connectors.

I’m not sure how to do either of those, especially the hybrid mode, which I think would require some detailed research into how typical software boots and what assumptions it makes about the card it’s booting from. From what I’ve read, some software assumes it’s booting from a Disk II card, and jumps to specific addresses in the card’s onboard ROM to help load sectors during the early boot process. This won’t work if my card’s ROM contains some custom hybrid Liron/DiskII code. Hopefully there’s a clever solution to this, like retaining entry points for Disk II compatibility at a few key addresses in the ROM code.

 
Attaching Drives

What’s the best way to attach drives to this card? It might have a single DB-19 female connector, and support a daisy-chain of several drives, like the built-in disk connector on the Apple IIGS. Or it might have two 10×2 rectangular connectors for ribbon cables, like the connectors on the Floppy Emu and on the Disk II controller card. Or it might even have two DB-19 female connectors.

The 10×2 rectangular connector is probably the best option, simply because female DB-19 connectors are so hard to find. I have a small supply, which I use to manufacture the BMOW Daisy Chainer for Floppy Emu. But pretty soon those will all be gone, and then the DB-19F will be extinct unless somebody wants to spend $15000+ for a Chinese factory to make new ones.

A compromise solution would be to use 10×2 rectangular connectors on the card, but design an optional rectangular-to-DB19F adapter. That way a female DB-19 would only be used where it’s needed. At the moment, that’s only for connection to a Unidisk 3.5 drive or the slim 5.25 inch drives (I forget what these are called… also Unidisk?). Disk II drives and the Floppy Emu use the 10×2 rectangular connector and don’t require a female DB-19.

I mentioned daisy chaining, but I’m not sure how that would be implemented in software. From my work on Floppy Emu, I’m familiar with how daisy chaining is implemented electronically for the drives, but I don’t know how the card’s ROM code detects and keeps track of all the drives in the chain. Daisy-chaining also means moving away from the simple “Slot X Drive Y” scheme to an environment where a disk controller card can have more than two drives attached, which somehow get mapped to other virtual slots. Yes there’s documentation for this, but it’s just one more challenge added to the pile.

 
Firmware Updates

All the interesting parts of the Yellowstone card are implemented in an FPGA, and I expect the FPGA design will be updated over time to fix bugs and add new features. Ideally there should be a way to update the FPGA for a Yellowstone card that’s in the field, without requiring the Lattice design IDE and JTAG programming hardware.

I haven’t confirmed this, but I think there’s a way to export the FPGA firmware from the Lattice IDE as a JTAG player file – basically a sequential list of JTAG commands. Then a stand-alone third-party hardware/software solution should be able to update the FPGA. In this case, that solution could be the Apple II itself. I need to design a way to bit-bang the JTAG signals from the Apple II, possibly using the game port, or even just using the address and data bus. It may be very slow, but it should work. Unfortunately the FPGA can’t help with this, since it will be in the midst of being reprogrammed, and so I’ll probably need to include some additional hardware on the card to support this self-programming.

One drawback of a self-programming approach is that the entire FPGA player file must be small enough to fit entirely in the Apple II’s RAM. It can’t be loaded piecemeal from the disk, because the disk controller will be non-functional while it’s being reprogrammed. This problem could be circumvented by using a second disk controller, but that doesn’t seem very elegant.

 
Support for 3.5 Inch Drives

The biggest unknown is potential support for unintelligent (dumb) 3.5 inch floppy drives, like the Apple 3.5 Drive A9M0106. From a completionist point of view, this would be great, because it would bring Yellowstone support for all three major types of Apple II disk drives. But there are some good reasons to omit it. In brief, it would be very complex and not very useful.

What is useful? While it would be nice to have, I believe there isn’t a strong need for a dumb 3.5 inch floppy controller card like this. If the main audience for the Yellowstone card is the Apple II+ and IIe, there just isn’t very much II+/IIe software on 3.5 inch disks. And even where software is on 3.5 inch disks, it could already be supported using the Yellowstone card in Liron mode with the Floppy Emu in Unidisk 3.5 emulation mode. The only case where dumb 3.5 drive support would be needed is connecting a real Apple 3.5 Drive A9M0106 to a II+ or IIe.

I would add 3.5 support anyway if it were easy, but it’s not. The Disk II card is basically a proto-IWM, and fairly easy to replicate in an FPGA. The Liron card is a full IWM with some extra bits of address decoding and a larger ROM. But the Apple 3.5 Disk Controller is crazy complex with its own onboard 6502 CPU, 32K of onboard RAM, a SWIM, and gobs of programmable logic. I don’t want to attempt replicating something as difficult as that, and I don’t even have an Apple 3.5 Disk Controller to examine. So… no.

A slightly more plausible path would be to follow the example of the Universal Disk Controller (UDC) that was marketed by Laser, VTech, and CPS. The fundamental problem of 3.5 inch disk support on an Apple II+ or IIe is that a 1 MHz 6502 isn’t fast enough to keep up with the bit rate of a 3.5 inch disk. The Apple 3.5 Disk Controller solves this problem by essentially putting an entire second computer on the disk controller. The UDC takes a different approach, using the regular Apple II CPU, but halting it at key times with the RDY signal instead of expecting the code to do busy wait loops. The elimination of busy waiting saves just enough CPU cycles that the 1 MHz CPU can handle the faster bit rate – or so I understand.

The UDC is interesting in that it’s a hybrid 3.5 / 5.25 inch drive controller. But duplicating the UDC would be no simple task. There are two versions, one with a hairy mass of 7400-series logic and the other with a single ASIC. Here’s an image of the “long” version with the 7400-series logic:

I don’t have examples of either type, so my efforts would be limited to examining photos and reverse-engineering the card’s ROM. I spent some time examining the ROM code, and it’s very complex. It appears to be an 8-way bank-switched ROM, with 2-way bank-switched onboard RAM, and it makes extensive use of self-modifying code. Here be dragons.

Help! If anybody has a UDC card they’d be willing to lend or sell, please let me know!

So for the time being at least, Yellowstone will offer Liron support (intelligent Smartport / hard disk) and 5.25 inch floppy drive support. Dumb 3.5 inch floppy drive support might come later as a version 2.0 type of feature.

 
Next Steps

That’s the state of everything as of today. There’s still a tremendous amount of work to do, but I’m happy to be making forward progress again. Do you have any suggestions or advice on where to go next, or how to address some of the challenges I’ve mentioned here? I’d love to hear it – please leave your feedback in the comments.

Read 17 comments and join the conversation 

Yellowstone Back From the Dead

Remember the Yellowstone disk controller card that I designed back in 2018? It was an FPGA-based clone of the Apple II Liron controller, with aspirations to eventually become a universal reconfigurable disk controller. And it worked nicely when it was the only card installed in the computer, but things went haywire when too many other expansion cards were also present. I eventually gave up and abandoned the project, but now I think I’ve fixed it.

The symptoms were documented in a series of blog posts here, here, here, and here. The more expansion cards present along with Yellowstone, the more likely I was to see errors such as unexplained resets and lockups and drops into the Apple II system monitor. Investigation with an oscilloscope showed lots of nasty looking signals, huge over and undershoot on the data bus, and strange transients on the power supply during card I/O.

There were plenty of theories to explain the problem, and I received over 100 comments from helpful readers. Some theories put forward were: poor grounding, insufficient bulk capacitance, a too weak 3.3V regulator, impedance mismatch, bus fighting, failure to meet the minimum input high voltage, a too strong or too weak drive from the bus driver IC, wrong FPGA slew rate, bad scope probes, bad power supply in the Apple II, a race condition in the logic, and more. My own best guess was a combination of grounding and impedance problems. I spent many weeks chasing various theories without much success. I hacked the card and replaced its bus driver with one from a different logic family. I even started to wonder whether the whole problem lay with the computer rather than the card. By March 2018 I gave up in frustration.

 
Two and a Half Years Later

This past week I’ve been investigating ideas for an Apple II video card, and a reader pointed me to this tech note about Apple IIgs expansion card design. The point was to learn which signals were provided to the different slots, but my eye caught a different paragraph titled “Avoiding Bus Fights”. As the text described,

“To avoid potential (or actual) bus fights, it is helpful to avoid driving read data from an expansion card onto the bus immediately after PH0 rises. … If a card drives data onto the expansion slot data bus immediately after PH0 rises, there may be a bus fight between the expansion card trying to drive the bus, and the Apple IIGS (or Apple IIe) bus buffers, which may not have turned around yet. … Developers can avoid bus fights by simply using 74LS or 74HCT series parts and relying upon typical delay stackups to delay driving the data bus for approximately 30 nanoseconds. A more solid technique is using the first rising edge of the 7M clock, after PH0 rises.”

My card responds when the Apple II asserts the I/O SELECT signal for its slot, which happens at the same time as PH0 rising. What this paragraph says is that the card should intentionally wait at least 30 ns before responding, because the motherboard’s 74LS245 bus driver is still driving the data bus even after PH0 and the assertion of I/O SELECT!

At first glance this seems ridiculous. Why would the Apple II assert I/O SELECT for a card before it’s safe for the card to output data? But if you assume the card is built with 1978 vintage ICs that can’t respond very quickly anyway, it wouldn’t have been a problem. The trouble only appears when you use an FPGA and modern logic families like 74LVC with propagation delays of just a few nanoseconds. It becomes necessary to add an artificial output delay to avoid bus fighting.

Several readers had suggested more or less exactly this, including Fluffysheap who described it perfectly in the comments here. But I must have been too frustrated or too tired back then, and I never fully followed through on checking this theory.

Bus fighting almost perfectly explains the horrible signals I observed on the scope. For a few tens of nanoseconds at the beginning of my card’s data output, it was fighting with the motherboard’s bus driver, creating eight short circuits on the 8-bit bus. This caused a surge of current, resulting in horrible power supply transients and wild swings on the bus. From my scope observations this period seemed to last about 70-80 ns, rather than the 30 ns mentioned in the tech note. But the tech note described the Apple IIgs, not the Apple IIe that I used for my tests. Maybe the Apple IIe bus driver is slower to shut off.

One thing that bus fighting doesn’t seem to explain is why adding more peripheral cards would make the problem worse. It appears my card was engaging in a bus fight with the motherboard’s own bus driver, and the other cards were just innocent bystanders. The only affect of their presence would be to increase the bus capacitance. I may still be missing something here.

 
Testing It

Armed with this newfound knowledge, I went to edit the Yellowstone FPGA source to insert an intentional delay before enabling the card’s bus driver for output. Lo and behold, code for creating a delay was already there, but commented out. It was written by me. I can’t remember if I ever tested it back in 2018. Maybe I had the idea but never tried it, or maybe I tried it but something went wrong. Either way, I gave thanks to 2018-Steve and just reapplied the already-existing code.

At first there was some comedy, because I tried several different changes that appeared to have no effect. After half an hour, I realized I was rebuilding the FPGA configuration file after each change, but then programming the old configuration file from 2018. Oops.

What can I say, it works. I loaded up my Apple IIe with a sampler of six different expansion cards in different combinations, connected to a variety of Floppy Emus and real drives, including a Smartport-aware Unidisk 3.5 drive. Everything worked as expected, and there were no unexplained resets or other weird behavior.

I looked at the data bus and the power supplies on the scope, and everything appeared cleaner than before. The power supplies looked OK. There was still some overshoot on the databus when the card first started driving, but much less than before. Maybe this can be improved further by adding some small inline resistors on the next version of the card. I adjusted the output delay to about 120 ns, which is probably much longer than necessary, but it still leaves more than ample time for 2020-era logic chips to do their jobs.

 
One More Issue

Things look good with Yellowstone on the Apple IIe, but the IIgs is another story. I have to switch the IIgs to normal speed instead of fast, but then the card should work. A real Liron disk controller card works fine in a IIgs, as long as the system speed is changed, so there’s no fundamental incompatibility. Unfortunately the Yellowstone card just plain doesn’t work on the IIgs. At first I thought that was a result of my new output delay, but removing the delay didn’t help. Then I dug through all my notes from 2018 and concluded that the card never worked on the IIgs under any circumstances. So this isn’t a new problem – it’s an old problem I just hadn’t noticed because it was overshadowed by the bus fighting.

The sort-of-good news is that this failure of Yellowstone on the IIgs seems repeatable and debuggable. Instead of weird spurious resets and lockups like I was seeing two years ago, it looks like it’s just a communication error with the drive. The Yellowstone card firmware appears to be running OK, and I get reasonable error messages like “NO DEVICE CONNECTED”. With a Unidisk 3.5 drive attached, there’s no drive activity. With a Floppy Emu attached and configured for Smartport emulation, it reports a checksum error.

I suspect there’s another timing problem here, but this one relates to writes to the card instead of reads. Perhaps I’m not latching the data from the bus at the right time, and due to minor differences between the bus timing on the IIe and IIgs, it still works OK on the IIe but occasionally writes the wrong values to the IWM chip on the IIgs, or fails to write anything. That would cause garbled communication with the drive.

I’m calling a stop for the moment, feeling pleased with this new progress, and optimistic that I’ll eventually find an explanation for the IIgs behavior.

Read 4 comments and join the conversation 

Older Posts »