BMOW title
Floppy Emu banner

Archive for December, 2011

Floppy Emu Board Layout

Whew! It took me a long time to do the board layout for the floppy disk emulator, but here it is! The board is about 4 x 1.75 inches, or roughly the size of an elongated credit card. The resistors, LEDs, and odd-sized capacitors are all labeled, so any other small rectangular surface-mount parts you see are 0.1 uF decoupling capacitors. Assuming nobody sees any problems, I’ll be sending this off to be manufactured in a few more days.

The AVR microcontroller used is an ATMEGA1284P, with 16K of internal RAM, running at 20 MHz with 5V. It replaces the ATMEGA32u4 breakout board used in my breadboard prototype. The CPLD is a Xilinx XC9572XL, which replaces the Atmel board from the prototype (which itself was salvaged from an old Tiny CPU project). The CPLD runs at 3.3V, but has 5V tolerant inputs. A 74LVC244 performs 5V to 3.3V level conversion. Sitting on top of the chips is a Nokia 5110 graphical LCD.

There are several options for connecting the Floppy Emu board to the Mac. The board has a male DB-19 connector as well as a male rectangular 20-pin IDC connector (the internal floppy connector on the motherboard). So you can:

  1. Plug the board straight into the Mac’s external DB19 floppy port. Then it will hang off the back like a dongle.
  2. Use an Apple II Unidisk/DiskII DB19 to 20-pin IDC cable, like this one from IEC. Connect the DB19 end to the external floppy port, and the IDC end to the Floppy Emu board. I purchased one of these and tested it to confirm that is has all the necessary connections.
  3. Use the DB19 to 20-pin IDC cable from an external Apple 3.5 inch floppy drive.
  4. Unplug your internal floppy drive, and use the existing internal floppy cable to connect to the board’s IDC connector. I’m not sure that cable is long enough to reach outside the case, though.
  5. Same as above, but use a longer 20-pin IDC cable. You can use any generic IDC cable with straight-through wiring.

After some consideration, I included both the 6-pin AVR ISP programming connector, and the 14-pin Xilinx JTAG programming connector. My goal is to use the AVR to program the Xilinx CPLD, so the JTAG connector is just there as a fallback. The JTAG interface consists of 4 pins: TMS and TDO are connected to dedicated pins on the AVR, but the TCK and TDI pins are shared for other purposes, since there aren’t enough pins for everything. The CPLD’s JTAG controller should stay in the reset state as long as the value of TMS is held at 1, regardless of what values appear on TCK and TDI, so in theory this should work fine. We’ll find out soon!

Initial programming of the AVR will be done using the 6-pin ISP connector and an AVR ISP mkII programmer. It should be possible to do all further AVR reprogramming using a bootloader, loading the new firmware from the SD card. That means if I build one of these boards for someone else, I can do the initial programming, then they can update the AVR firmware later by just storing an update file on the SD card and rebooting the emulator. They won’t need to own an AVR programmer. And since the AVR will program the CPLD, that means the complete firmware of both chips can be updated without the need for any special programming hardware. That’s pretty cool.

Read 15 comments and join the conversation 

Three Crazy Ideas

While I’m optimistic that the floppy write emulation technique described yesterday will work (at least for high speed cards), it would be great if I could buy an extra safety margin of time, or find a way of throttling the incoming data from the Macintosh during a write if it’s too fast. The biggest challenge is emulating the initialization of a floppy, where sectors to be written arrive from the Mac rapid fire, without stopping. Here are three slightly crazy ideas that just might work to handle the firehose of incoming data.

Floppy Driver Patch

One possibility is to write a custom INIT or extension that patches the floppy driver code in ROM, and extends the track step timeout from 12 ms to something much longer.  This would be a simple change of just a few bytes, and it would enable the emulator to pause the incoming data after each track step, while it saved the previous track’s data to the SD card. Because there’s no problem with the speed of floppy read emulation, the INIT itself could still be loaded from the emulated floppy.

The major drawback of this approach is that it would force you to boot from a special Floppy Emu setup disk in order to load the INIT. I also don’t know anything about writing INITs and extensions, and I’m not sure if many different versions would be needed. Can the same INIT work with System 1.0 and System 9?

Faking An Error

In yesterday’s post, I said there’s no error mechanism that can be exploited to slow down the incoming data without causing the write operation to fail. I took another look at it today, and I think I may have found a way, by exploiting some code that measures the size of the gap between the last sector and the first sector on one side of a track. During initialization of a floppy, after the Mac finishes writing the last sector on a side, it immediately switches back to read mode to measure the gap before the next sector, and confirm that the next sector is sector 0.

The disk initialization code uses some kind of progress counter that starts with a value of 7. Every successful side written increments a counter by 1. If the gap is the wrong size, the counter is decremented by 1. If the counter value is greater than 4, it attempts to rewrite the side again, otherwise it aborts with an error.

By intentionally generating a bad gap size after a full side is written, I can force the side to be rewritten. If I also make the emulator smart enough to detect when data written to a sector is identical to what was already there, then it can ignore the second rewrite. That effectively doubles the amount of time available for saving the track data to the SD card, since every side will be written twice by the Mac.

The bad gap size trick can only be done once per side, or else the progress counter will decrease and the initialization process will eventually fail, so it can’t buy an indefinite amount of additional time. It’s also a little risky, because it means the progress counter will never increase above 7, and any 3 other errors occuring during the initialization will cause it to fail.

I did some simple tests of this idea that look promising. By disabling SD saves, I was able to perform floppy initialization to measure its write speed, even though the initialization ultimately failed during the verify phase. In my initial test, it took 34 seconds to complete the write phase of initialization. After I added emulator code to generate a bad gap after every other side write operation, the time increased to 59 seconds, with no obvious ill effects.

Zero Flag

During floppy initialization, the Mac writes 1600 sectors very fast. What’s in those sectors? Zeroes. Instead of buffering a 512 byte sector full of zeroes, I could just set a flag that says “this sector is all zeroes”. Using a bitfield, I could buffer an entire disk’s worth of zero sectors using just 200 bytes of RAM. Those sectors could then be saved to the SD card whenever it was convenient, after the floppy initialization was finished. If a read request arrived before all those zero sectors were saved to the card, the emulator could check the flag first to see if an all-zero sector should be synthesized instead of actually loading the sector data from the SD card.

I like this idea because it’s short and simple, though its usefulness is limited to floppy initialization only.

Read 5 comments and join the conversation 

Floppy Write Emulation, Continued

My apologies for another post full of abstract floppy emulation thoughts, with nothing concrete to discuss and no photos to show. As readers have no doubt noticed, I’m having a difficult time wrapping my head around the best way to approach this problem. Fortunately, it’s slowly becoming clearer how to proceed.

Current Prototype

First, a review of the current prototype that I’ve built: the emulator performs SD loads and saves on demand, at the instant they’re required by the Mac. There is minimal RAM buffering. This “on the fly” access method works very well for floppy read emulation, using any type of SD card, and could be used to make a nice read-only floppy emulator. Sadly, it doesn’t work as well for write emulation. With a high speed (class 10) SD card it works for sector-by-sector write emulation, but it fails with slower SD cards, or with any speed card when doing whole-track writes or floppy initialization.

The Problem With Writes

The fundamental problem with writes is that data comes in from the Mac faster than it can be saved to the SD card. The only solutions are to save to the SD card faster, or slow down the rate of data transmission from the Mac. I’ve recently spent a substantial amount of time searching for a way to slow down the incoming Mac data, and I’ve concluded that it’s just not possible. The Mac blindy blasts out sectors to be written. There is no feedback signal, no flow control, no ready flag, and no error mechanism that can be exploited to slow down the incoming data without causing the write operation to fail. The only remaining path is to somehow speed up SD saves, so the emulator can keep up.

Using a different or faster microcontroller wouldn’t help much. Yes, it would reduce the amount of time needed to send the data to the SD card during a save, but that’s only a small part of the total save time. The bulk of the time is spent waiting for SD card internal operations to complete (Flash page erase, etc), and that’s independent of what microcontroller is used. Using a faster SD card will help, but even a class 10 card (the fastest class) struggles to keep up.

RAM Buffering

After a lot of thought, I’ve decided to switch back to the ATMEGA1284 microcontroller that I’d originally planned to use, which provides 16K of internal RAM that can be used for buffering. That’s enough RAM to buffer both sides of an entire track, with 4K space remaining for other uses. While not a panacea, the additional RAM will help in three ways:

Reads from RAM – Once all the sectors for a track have been read into RAM, the emulator can continuously stream them to the Mac in an endless loop, with no further SD access necessary. This frees 100% of the SD bandwidth available for writes, in contrast to the “on the fly” method which is constantly loading data from the SD card.

Multi-block SD Transfers – When modified sectors in RAM must be saved to the SD card, the save can be performed using a multi-block SD save, which is substantially more efficient than doing many individual single-block SD saves.

Data Rate Smoothing – A large buffer can help smooth brief fluctuations in the incoming data rate, or the SD save speed. For example, if the SD card is capable of saving 50 sectors per second, but the incoming data rate briefly shoots up to 100 sectors per second, a system with no buffering will fail. But assuming the buffer is large relative to the duration of the data rate spike, the buffered system will continue to work. The same is true for brief spikes in the time per sector needed for SD saves. The buffer keeps the system working as long as the average incoming data rate is less than the average save rate, instead of requiring the fastest instantaneous incoming data rate to be less than the slowest instantaneous save rate.

Even with these advantages, I suspect there will still be some SD cards that aren’t fast enough for the fastest floppy write operations (floppy initialization). The slower class 4 PNY card I’ve been experimenting with probably won’t support initialization, because even using multi-block saves, it takes about 375 ms to save a track, but the Mac writes a new track roughly every 350 ms. This is a case where the average incoming data rate is greater than the average save rate, and there’s nothing more that can be done to help it. I expect this card will still work for disk copy operations and normal sector-by-sector write operations, however.

The Other Shoe Drops

OK then, just add more RAM buffering and then everything will be great, right? Well, no. In order to take advantage of the RAM buffers while still meeting the floppy timing requirements, the emulator firmware will need to be considerably more complex. The current prototype is a single-tasking microcontroller program: at any given time, it can be doing an SD card transfer or a Macintosh sector read/write transfer, but not both. The program will need to be enhanced so that both interfaces can be active at the same time, by handling one of them entirely in an interrupt routine.

The strategies used to decide when to load and save data from SD must also be more complex. For example, after the Mac sends a sector to be written to the emulator, should it be saved to the SD card right away? That would provide the biggest head start. Or should it wait a while, and see if more sectors are received, which could then be batched together into a multi-block SD save? Another example, when the Mac steps to a new track, should the emulator first save dirty sectors from the previous track, or should it first load the sectors from the new track, or perhaps interleave the two operations? A third example, while first loading the sectors for a new track, the Mac may begin a write operation, and may attempt to write the very sector that is currently being loaded from the SD card. How should that be addressed? There’s a lot to think about, and it’s not clear what the right answers to these questions are.

Sanity Check

To make sure I’m not doing something ridiculously stupid, I looked at two other hardware floppy emulator projects that seem most similar to this one.

The HxC Floppy Emaultor is the closest relative to Floppy Emu. It uses a PIC, 32K SRAM, and an SD card to emulate floppy read/write operations for a variety of computers using Shugart style floppy interfaces. At a high level, HxC appears to use the same emulation method that I’ve proposed here. I don’t know if it supports high-speed write operations (initialization) on slow speed SD cards. The documentation doesn’t make any mention of a minimum required SD card speed, and the author hasn’t yet responded to my question about it in the forums. He did answer many of my general high-level questions, though, which was nice. It appears he’s quite concerned about other people cloning the HxC, and is reluctant to give out too much detailed implementation information. The firmware and hardware of HxC is a closed, proprietary system.

Semi-Virtual Diskette (SVD) is a simpler design, using a PIC and 256K of SRAM to store the disk image. There is no persistant storage, and disk images are loaded and saved to a host PC using a serial cable. As with the HxC emulator, computers using Shugart type floppy interfaces are supported. The 256K RAM size limits the possible floppy types to low-capacity ones. All the hardware schematics and firmware source files are available to download. The SVD project appears to be inactive– the author didn’t respond to my email, and the web site hasn’t been updated in several years.

There are also a couple of commercial floppy emulators, but they lack any real implementation info. At any rate, it appears that I’m headed down a reasonable path by introducing track-sized RAM buffering, even if it will introduce a whole mess of new complications to the emulator software. The next step is to build a real hardware prototype with an ATMEGA1284, and start working on the software revisions. Woohoo!

 

 

Read 15 comments and join the conversation 

Floppy vs SD Card Write Speeds

Which would you guess supports faster write speeds: a modern high-speed class 10 SDHC Flash memory card, or a vintage 1984 floppy disk employing a mechanical stepper motor to move its drive head, spinning magnetic media at a lazy 400 RPM? For large writes of contiguous blocks, the SD card’s speed blows the floppy away. But for random single block I/O, or sequences involving many alternating reads and writes, the SD card struggles to match the performance of the lowly floppy. That’s not a great thing to discover when you’re halfway through designing a floppy disk emulator.

Let’s put some numbers on performance. The floppy can read or write a 512 byte sector in 12 ms, with no variability: the speed is always one sector per 12 ms. An 8GB PNY class 10 microSDHC card in SPI mode can read a 512 byte block in about 2-3 ms, with a 4 MHz SPI clock. The same card exhibits single block write times of typically 5-9 ms, but with occasional spikes up to 70+ms for a single block. Write times appear to be inherent in the card, and mostly unrelated to the SPI clock rate. So while the average write speed of the SD card is somewhat faster than a floppy, the speed is variable, and the worst case is slower than a floppy.

Class 10 SDHC Emulator Results

The good news is that the class 10 SDHC card is fast enough to support emulation of normal floppy writes, in which some number of sectors on an existing floppy are updated. I’ve been able to copy large files around on the emulated floppy disk reliably, using the new class 10 card. This type of write actually follows a read-write-read-write pattern, as the Mac alternately reads to find the address section of the desired sector, then writes to replace the sector’s data section. Following each write, the emulator takes 5-9 ms to write the data to the SD card, while supplying sync bytes to the Mac. The Mac sees this as an extra-large intersector gap while attempting to read the next address section. It will tolerate gaps of up to roughly 23 ms, although this will make writing files noticeably slower than a real floppy.

The bad news is that the class 10 SDHC card is not fast enough to support emulation of continuous floppy writes, such as those during initialization of a floppy, or when doing a full-disk write with a disk copy program. This type of write is just a constant stream of incoming bytes, at a rate of one sector per 12 ms. The emulator cannot stall after the first sector to peform an SD write, because the second sector is already inbound. To address this I implemented a double-buffered system, which uses an interrupt routine to read the next sector’s data into a new buffer, even while the data from the old buffer is being written to the SD card. Unfortunately, the overhead of the interrupt routine increases the SD write time to 12+ ms, so the emulator simply can’t keep up with the incoming data. Using more than two buffers might help, if there were enough RAM for them, but the average SD write time would still need to be under 12 ms. Buffering helps recover from occasional “burps” where a write takes longer than 12 ms, but it can’t improve the overall write speed.

Incidentally, while studying continuous write behavior, I discovered that sectors in a Macintosh floppy track are interleaved like 0 6 1 7 2 8 3 9 4 10 5 11, rather than being appearing in consecutive order by sector number.

Arrgh!!

This whole business of emulator write support is turning into quite a pain, causing the fun value of the project to drop steeply. To make matters worse, I somehow managed to brick the class 10 card while I was experimenting with raw SD block operations, and I don’t have any device that will reformat it. As much as write support is an essential part of floppy emulation, I’m questioning how much more time it makes sense for me to sink into it. I’m therefore tempted to eliminate write emulation entirely, release a design for a read-only floppy emulator, and leave it at that.

My brain is struggling with the details of card performance in single-block and multi-block write modes, various buffering schemes, and timeout values for Macintosh I/O operations. My gut tells me there must be some clever way to use buffering and/or multi-block writes to get reliable write performance, even with a slow SD card, but so far I haven’t found a solution. And regardless of any amount of buffering or other clever schemes, I believe that if any single block write takes more than about 12 ms (the max track step time) + 23 ms (the max intersector delay), then emulation could fail. The computer might do a write immediately followed by a track step and then a read, which couldn’t be serviced until the write finished.

I looked again at the HxC Floppy Emulator, which uses a PIC and a 32K SRAM to emulate floppies for other classic computers. The author has been kind to answer many of my questions about its inner workings, but I don’t know the buffering strategy it uses, or whether it’s subject to failure in the same worst cases as my design.

Write Options

Some other possibilities, short of eliminating write support completely:

Experimental Write Support – I could leave the write emulation as it is now, and call write support “experimental”. It would be a crap shoot whether it worked or not, depending on the card that was used. I think normal writes from Finder file operations would work on most class 10 cards, but continuous writes (disk initialization and disk copying) wouldn’t. Maybe that’s acceptable.

Strict Card Requirements – The author of SdFatLib ran one of my performance tests, and got substantially better write performance using two different models of SanDisk Extreme cards than I saw on my class 10 PNY card. Unfortunately my local store didn’t have any high speed SanDisk microSDHC cards. If those cards work reliably, I could make them “required” for write support, but I’m uncomfortable with that idea. Even if it worked, I wouldn’t be any closer to understanding why some cards work and some don’t. I’d also be faced with the task of continuously testing new cards as the old SanDisk ones were obsoleted and replaced with different models.

Multi-Sector Writes – Instead of single block writes, I could use the SD multi-block write method. Using this method, you tell the card “I’m writing N blocks beginning at location L”, and it pre-erases all the blocks, then writes them quickly as they arrive. This makes individual block writes much faster, but requires the pre-erase step, and also requires knowing how many blocks you’re going to write before you write the first one. That’s not possible when writing blocks as they arrive from the Macintosh, since it’s never known when the floppy write will end. If many sectors were buffered in RAM first, then they could be written in a multi-block write, but the length of the multi-block operation would present its own challenges. What would happen if during the long multi-block write, the Mac decided to step to a different track and begin reading new sectors?

A related method I’ve yet to try is to erase the SD block as soon as the data for it begins to arrive from the Mac, instead of waiting until the entire sector is received from the Mac before doing an SD erase and write. I’m not even sure that’s possible, but it seems like it would help.

Code Optimization – I believe that continuous writes with the class 10 PNY card are falling just short of the necessary average speed. If I could optimize the interrupt routine to reduce its overhead, it might work. Without substantial buffering, however, continuous writes would still fail whenever there was a single anomalous slow write, even if the average write speed were fast enough.

More RAM Buffering – By using an AVR with more internal RAM, I could buffer more sectors during writes. That feels like it should help somehow, but I’m not certain it actually would. With my current code, normal writes don’t use buffering. The class 10 card doesn’t need buffering, since the SD write is performed during the intersector gap before the next read. The class 4 card I tested earlier had such strange latency patterns (12 consecutive writes of 50-80 ms) that no amount of buffering would help it. In fact, buffering would provide no benefit, because a second write from the Mac cannot begin until the first write to the SD card finishes, the next sector is read from the card, and the Mac reads that sector’s address section.

Additional buffering would help somewhat with continuous writes, if they can be optimized enough so that their average write time is fast enough. A large buffer could also be used to read a full track into RAM at once, then play the tracks from RAM instead of continuously reading them from the card. That would enable SD writes to happen without blocking SD reads of sectors in the same track. However, a similar blocking problem would still occur if the Mac stepped to a different track and began to read sectors there, while a long SD write operation was monopolizing the card.

Buffer the Whole Disk – The extreme of buffering is to use 800K of external SRAM to buffer the entire disk. Maybe that’s a sensible idea, and it would certainly work, but I’m very reluctant to do it. Aside from the additional pins needed and the cost of the parts, it just feels wrong. HxC is proof that floppy emulation should be possible without a full disk buffer.

Whew!

Documenting the possible options here has been an exercise in organizing my own thoughts, more than an attempt to explain them to others, so I hope it’s comprehensible. It’s starting to feel a bit like I’m launching into a graduate thesis project! That’s not a good sign, and I’m concerned I’ve already spent more time experimenting with write support than makes sense. Once I (hopefully) unbrick my class 10 SD card, I’ll try a few more experiments to see if I can improve write performance further. But after that, I think I’m going to return to the hardware design, and plan to use an ATMEGA1284P with 16K of internal RAM. Any further improvements to write emulation will then have to be done entirely in firmware, within the limitations of that hardware.

Read 13 comments and join the conversation 

SD Write Speed Anomolies

I have floppy disk write emulation almost working for Floppy Emu, using an almost painfully simple technique. When the Mac sends data to be written, an interrupt routine decodes it and stores it in a RAM buffer. Once a full sector has been stored, the main code path performs a blocking write to the SD card. If the Mac attempts to read the floppy during this time, it will see nothing but sync bytes. If it attempts to do another write, the write will be ignored.

This method works because the Mac doesn’t actually write a whole string of sectors at a time. Instead, it performs an alternating pattern of reads and writes for each sector. It reads the disk, waiting until the address section for sector N passes the drive head. Then it switches to write mode, and overwrites the data section for sector N with new data. After the write, it switches back to read mode and looks for the address section for sector N+1. The string of sync bytes provided by Floppy Emu while it’s writing the SD card is interpreted by the Mac as a normal (if unexpectedly long) intersector gap.

Using this method, I’ve successfully performed one-sector writes with a test app. I’ve also had some success copying larger files (50-100K) onto the emulated floppy with the Finder– sometimes it works, sometimes not.

While investigating why the writes sometimes fail, I discovered something strange. With my microcontroller at 16 MHz and a 4 MHz SPI clock, using SdFatLib to write a single aligned 512-byte block normally takes 3-7 milliseconds. That’s fast enough to keep the Mac happy. But I’m seeing a consistent pattern where after 8N consecutive blocks written, the write time jumps to 50-80 ms per block for the next 12 blocks, then returns to normal. In other words, it will write some multiple of 8 blocks at 3-7 ms per block, then write the next 12 blocks at 50-80 ms per block, before returning to the original 3-7 ms speed. 50-80 ms is  too slow for the Mac, so it aborts, and the write operation fails.

In most cases N=4, so the strange behavior begins after 32 blocks. This seems to be true no matter what portion of the SD card file I write into, for consecutive sectors in both increasing and decreasing order. The code is (roughly):

	// time some writes at a random position within the file
 	uint16_t sect[100];
	uint8_t time[100];
	for (uint16_t s=900, c=0; s<1000; s++, c++)
	{
		uint32_t writePos = s * SECTOR_DATA_SIZE;
		f.seekSet(writePos);
		uint32_t t1 = millis();	

		// save the sector
		if (f.write(sectorBuf, SECTOR_DATA_SIZE) != SECTOR_DATA_SIZE)
		{
			// write error
		}
		uint32_t writeTime = millis() - t1;

		sect[c] = s;
		time[c] = writeTime;

		_delay_ms(5);
	}

This may well be a problem with my SD card, or something strange about SdFatLib, but I’m unsure where to go next to troubleshoot it further. None of the write methods I’ve looked at will tolerate 50-80 ms write times, short of buffering the entire 800K disk image in an external RAM. The consistency of the 8N fast blocks followed by 12 slow blocks makes me suspect some kind of cache or buffer somewhere is filling up. But then I would expect all further writes to be slow, instead of returning to normal speed after 12 slow writes.

Read 7 comments and join the conversation 

Violating Setup Times With Floppy Writes

I’m working on adding write support for the Floppy Emu emulated Macintosh floppy drive. Data coming from the Macintosh to be written to the floppy is encoded in an interesting way. There’s no clock signal, but just a single WR data signal. The incoming WR data is divided into bit cells of “about 2 microseconds” duration, according to the IWM datasheet. At each bit cell boundary, a high-low or low-high transition indicates a logical 1 bit, and no transition indiciates a logical 0 bit.

This technique presents some challenges to the device that’s decoding the WR data. Without a clock, how does it know when to sample the data for the next bit? And without some kind of framing reference, how does it identify the boundaries between bytes?

Instead of sampling bits at some fixed frequency, my solution uses 16x oversampling to measure the duration between WR transitions. A measured duration of about 2 microseconds (with some error tolerance) is interpreted as a 1, about 4 microseconds is 01, and about 6 microseconds is 001. Durations longer than 6 microseconds should never appear, since the GCR encoding method forbids having more than two consecutive 0 bits.

To identify the boundaries between bytes, the circuit uses the fact that all valid GCR bytes have a most significant bit of 1. If the MSB of the shift register is 1, it saves the completed byte, and clears the shift register. Assuming it starts at a random location in the bit sequence, the circuit will eventually sync up with the byte boundaries correctly, but it may take many bytes before it syncs correctly. Fortunately the Apple designers planned for this, and each sector begins with a string of 10-bit sync bytes 1111111100. No matter where it starts in the sequence, a shift register using this byte indentifiaction technique will get in sync after no more than five sync bytes.

The waveform above shows a simulation of the start of a sector, consisting of five sync bytes followed by D5 AA 96, the sector header identifier. The top trace is the WR signal, and the bottom trace is the output of the shifter/decoder circuit. Here’s my first version of the Verilog code, using an 8 MHz input clock, where 16 clocks equals 2 microseconds.

reg [7:0] shifter;
reg [7:0] wrData;
reg [4:0] bitTimer;
reg wrPrev;

always @(posedge clk) begin
  // was there a transition on the wr line?
  if (wr != wrPrev) begin
    // has at least half a bit cell time elpased since the last cell boundary?
    if (bitTimer >= 8 ) begin
      shifter <= { shifter[6:0], 1'b1 };
    end
    // do nothing if the clock count was less than 8

    bitTimer <= 0;
  end
  else begin
    // have one and a half bit cell times elapsed?
    if (bitTimer == 24) begin
      shifter <= { shifter[6:0], 1'b0 };
      bitTimer <= 8;
    end
    else begin
      // has a complete byte been shifted in?
      if (shifter[7] == 1) begin
        wrData <= shifter; // store the byte for the mcu
        shifter <= 0; // clear the byte from the shifter
      end	

      bitTimer <= bitTimer + 1'b1;
    end
  end

  wrPrev <= wr;
end

I implemented the circuit as above, and it mostly worked. The output was recognizably close to what was expected, but with lots of seemingly random bit errors. The errors weren’t consistent, and comparing the output to the expected values, the errors didn’t appear to be systematic either. I was hoping that they might all be cases of a 0 turning into a 1, or all cases of a 1 turning into a zero, or all cases of a single bit being added or removed in the sequence, but it was nothing like that. I couldn’t find any identifiable pattern to the errors at all.

A day passed. I chased after theories involving voltage levels, bus contention, poor wiring, and others.

Finally I got to thinking about the timing relationship between the WR signal and the 8 MHz clock– there is none. I should have realized this earlier, since it’s nearly the same problem I had a few weeks back with the LSTRB signal when I was implementing read support. WR might transition right at an 8 MHz clock edge, so that its sampled value is neither a clean logical 0 or 1, but somewhere in between. What happens then?

Naively, I had thought it would either do the 0 behavior, or the 1 behavior. In this example, it would either do the first if block and add a 1 to the shifter, or else it would do the second if block, and check the timer to see if it should add a 0 to the shifter. It wouldn’t really matter which behavior it did– a transition on WR would either add a 1 to the shifter on clock cycle N or N+1, but it would still get added. The test for bitTimer >= 8 would make sure that an apparent double-transition of WR didn’t accidentally add two 1’s. Everything would work great.

If only it were so simple. The registers bitTimer, shifter, and wrData are composed of many individual macrocells in the CPLD, one macrocell per bit. Each macrocell will decide independently if wr != wrPrev at the clock edge. What happens if they don’t all agree, and some macrocells think there was a transition, and others don’t? You get a big mess of random errors, which is exactly what I was seeing. This is why a synchronous system would impose a setup time on WR, to make sure its value was established long enough before the clock edge to ensure that every macrocell saw the same value. This isn’t a synchronous system, though, and there’s no way to guarantee that WR won’t change states at a bad time.

Fortunately the solution is simple: just send WR to a register, then use the register value in the circuit instead of WR. That means the circuit will be using a value of WR that’s delayed one clock from the “live” value, but that’s not a problem here. Because the value of the register will only change at a clock edge, the circuit that uses the value won’t see it change states at a bad time, and setup time requirements will be met. This technique is probably second nature to many readers, who’ve been shouting at their monitors for the past six paragraphs, but it took me a while to figure out. The code changes look like this:

reg [1:0] wrHistory;

always @(posedge clk) begin
  // was there a transition on the wr line?
  if (wrHistory[1] != wrHistory[0]) begin
    // ... remaining code is the same as before

  // ... 

  wrHistory <= { wrHistory[0], wr} ;
end

With that change, I’m now able to reliably parse floppy write data coming from the Mac. Next up: reading the data with an interrupt routine, and saving it to the SD card.

 

Read 6 comments and join the conversation