BMOW title
Floppy Emu banner

Archive for the 'Floppy Emu' Category

Limiting SD Card Inrush Current

I’m experimenting with methods to limit the inrush current when an SD card is inserted, and beginning to wonder whether my solutions are worse than nothing.

When an SD card is inserted into a board that’s already powered on, a large amount of current will flow briefly, as the card’s internal capacitors are charged through its 3.3V supply pin. This is called inrush current. If the inrush current is too large, it can overtax the main board’s voltage regulator and capacitors, causing the board’s supply voltage to drop temporarily. If the voltage drops far enough, it may cause the board’s microcontroller to do a brownout reset. That’s what happened with early versions of the Floppy Emu. It wasn’t really a problem, because you’ll almost always want to perform a reset anyway after inserting a new card, but it was slightly annoying.

In later versions of the Floppy Emu, I added a 1 uH inductor and 10 uF capacitor for the SD card, as shown in the circuit schematic above. Later the capacitor was changed to a 33 uF tantalum. The purpose of the inductor was to limit the inrush current, preventing the main board’s supply voltage from sagging and causing a brownout. And it worked, mostly, as confirmed by observing the main supply and SD card supply voltages on a scope during card insertion. The exact behavior depended on the brand and type of SD card and the card’s internal circuitry. Some types of cards still caused a brownout reset when hot-inserted, but it was rare.

Revisiting this question again recently, I noticed that the inductor created a new issue that may be worse than the one I was trying to solve. When the SD card is inserted, its 3.3V supply pin doesn’t go cleanly from disconnected to connected. Instead it bounces and wiggles over a period of microseconds to milliseconds, just like the contacts of a mechanical switch. As a result, the inrush current isn’t one single burst, but a series of short on/off current pulses. Because of the presence of the inductor in the circuit, these pulses create voltage spikes on the SD card’s 3.3V supply pin. They’re brief – lasting about 100 ns – but some of the spikes go above 4V. Despite their brevity, I’m wondering if they’re high enough to damage the SD card.

Using an inductor seems to be a pretty standard solution for SD card inrush current, but I’ve never seen any discussion of the voltage oscillation and spikes this can cause for the card’s supply. An alternative is a power management IC with “soft start” behavior, but I’m not interested in adding extra chips in this case. I’m starting to think it may be best to remove the inductor, and connect the card’s 3.3V supply directly to the board’s 3.3V supply. Better to cause a nuisance brownout due to high inrush current, than to risk damaging the card with voltage spikes – and still have brownouts sometimes anyway. Have you ever dealt with this topic? How did you address it?

Read 15 comments and join the conversation 

Apple IIc Drive Switcher Version 2

Version 2 of the Apple IIc Internal/External Drive Switcher is here, and is working nicely. It’s mostly the same as version 1, except for a few tweaks to the piece that goes inside the IIc case. Although the inside fit is still very tight, with this new version it’s a little more forgiving. I also updated the switch labels on the external portion of the drive switcher, with an icon showing parallel arrows for normal mode, and crossed arrows for drive swap mode.

In my previous trials with the drive switcher, I threaded two wires through a gap in the Apple IIc case around the 19-pin disk port. That works, but I’ve concluded it’s easier to thread the wires through the gap around the composite video connector instead, as shown in the photo below. There’s a little bit more wiggle room, and it avoids blocking the faceplate of the male DB-19 connector when the external portion of the drive switcher is plugged in. Threading the wires around the video connector doesn’t cause any blockage problems, and I had no difficulty plugging in the video cable. You could also thread the wires around the DB-15 monitor port, which most people aren’t using anyway.

I think it’s ready! Now I just need to assemble a few of these and get them ready for sale. If you’re interested in helping to beta test the first few units, please let me know.

Read 14 comments and join the conversation 

10000 More DB-19 Connectors

Oops, I did it again: another 10000 DB-19 connectors fresh from the factory! After helping to resurrect this rare retro-connector from the dead in 2016, and organizing a group of people to share the cost of creating new molds for manufacturing, I had some of the 21st century’s first newly-made DB-19s. The mating connector is found on vintage Apple, Atari, and NeXT computers from the 1980s and 1990s, so having a new source of DB-19s was great news for computer collectors.

But that was two years ago. After manufacturing, the lot of connectors was divided among the members of the group buy, leaving me with “only” a few thousand. Since then I’d used up more than half of my share in assembly of the Floppy Emu disk emulator, and I began to get nervous about the looming need for a re-order. It was such a big challenge the first time finding a Chinese manufacturer for the DB-19s, and the all-email company relationship was tenuous. What if they lost the molds? What if my contact there left the company? What if the company went out of business? Even though I didn’t absolutely need more DB-19 connectors until 2019 or 2020, I decided to lock in my future supply and order more now.

I needn’t have worried, and the transaction went smoothly. With no mold costs to pay this time, the only challenge was meeting the 10000 piece minimum order quantity. I was even able to pay via PayPal, instead of enduring the hassles and weird scrutiny of an international bank wire transfer like I did in 2016.

So now I have a near lifetime supply of DB-19 connectors. Call me strange, but it actually gives me a warm fuzzy feeling. Now to find someplace to store all these boxes…

Read 8 comments and join the conversation 

Apple IIc Internal/External Drive Switcher

If you’re using a Floppy Emu disk emulator with an Apple IIc, you’ll want to see this: a switched adapter that can reassign the external 5.25 inch drive as internal, and the internal 5.25 inch drive as external. This little gizmo helps to work around the Apple IIc’s inability to boot from an externally-connected 5.25 inch drive. That shortcoming is a headache for 5.25 inch disk emulators like Floppy Emu. With this internal/external drive switcher, the limitation is now gone!

 
Background

The IIc has an internal built-in 5.25 inch floppy drive. The internal drive appears to the computer as slot 6, drive 1. If you connect an external 5.25 inch floppy drive, it will appear to the computer as slot 6, drive 2. Unfortunately the whole Apple II family is designed to check for a bootable disk in drive 1 only. The computer can boot from drive 1, and then use drive 2 as a secondary disk, but it can’t boot from drive 2. So for the IIc with its built-in drive 1, this means it can never boot from an external 5.25 inch drive.

An important detail: this limitation only applies to the Apple IIc with an external 5.25 inch drive. An external Smartport drive (like Floppy Emu when configured for Smartport hard disk emulation mode) appears to the computer as slot 5, drive 1, and is bootable.

Apple IIc owners who want to boot from an emulated 5.25 inch disk image are in a difficult spot. Until now, their best option has been to remove the top panel from the IIc, disconnect the internal floppy drive, and connect the Floppy Emu to the internal drive connector on the motherboard. This works fine for the Floppy Emu, but it means IIc owners forfeit their ability to use the internal drive.

 
How the Drive Switcher Works

There’s almost no difference between the internal drive connector on the motherboard and the external drive connector at the rear of the Apple IIc. Although they’re different shapes and even have different numbers of pins, they feature the exact same disk IO signals except one: the drive enable signal. To perform this drive switcheroo, the adapter needs to tap into the signals from the motherboard, divert the enable signal externally, and route the external enable signal back inside to the internal drive. This is easily accomplished with some headers and wires and a slide switch, but the tricky part is making it all small enough to fit inside the Apple IIc case.

A slide switch makes the drive remapping optional. At one switch position, the external drive will appear as drive 1 and the internal as drive 2. At the other switch position, the internal drive will appear as drive 1 and the external as drive 2. Now Apple IIc owners can have the best of both options.

 
The Hardware

This is a two-part device: a signal tap that should be installed inside the Apple IIc, and a modified DB19 adapter with a slide switch for the external connection. Two female-female jumper wires are passed through a gap in the case to make the connection between the two parts.

 

The signal tap portion of the drive switcher looks a little peculiar, and it’s a minor challenge to solder closely-spaced through-hole components to the top and bottom of a PCB like this, but it works. The top is just a standard 20-pin male shrouded header, with a polarizing key like the one used on Apple drive cables. The bottom is a PCB-mounted female version of the same connector – not a very common part, but fortunately Digikey has it. The only other component here is a 2-pin male header for attaching the jumper wires.

Step 1 is to remove the top panel from the Apple IIc (follow the instructions here), and locate the ribbon cable that connects the internal floppy drive to the motherboard.

Disconnect the ribbon cable from the motherboard, and plug the signal tap into the motherboard in its place.

Then plug the ribbon cable into the signal tap. Also connect one end of each jumper wire to one of the signal tap’s male header pins. Here I chose to connect the brown wire to pin 1, and the red wire to pin 2. I’ll need to make the same choice later for the external jumper wire connections.

Before closing the case, it’s important to squish the ribbon cable down into the gap between the signal tap and internal floppy drive bracket. Push it down as far as it will go. This will make it easier to fit the top cover back on later. Notice the difference between the ribbon cable position in this photo as compared to the previous one:

Now it’s time to close the case. First, set the top panel loosely on the IIc, and thread the jumper wires through the opening for the disk connector in the rear of the case.

Then reinstall the top panel. It’s a snug fit, but there’s a large enough gap between the top panel and the rear connectors for the jumper wires to squeeze through. As an alternative, the jumper wires can also be threaded through the opening for the printer port or the video connector. After the top panel is reinstalled, it should look like this:

Connect the jumper wires to the 2-pin male header on the switched DB19 adapter, remembering to use the same color-to-pin mapping as before. Then plug the DB19 adapter into the Apple IIc’s external disk port. It will replace the standard DB19 adapter that’s included with the Floppy Emu.

Finally, connect the Floppy Emu’s 20-pin ribbon cable to the switched DB19 adapter. All done! This Apple IIc can now boot Choplifter and other 5.25 inch disk image favorites from the Floppy Emu, while retaining the internal 5.25 inch floppy drive for secondary needs like disk copying. Or at the flick of a switch, the IIc can be restored to normal operating, with the internal floppy drive configured as the boot drive.

 
Coming Soon

I hope to have the IIc Internal/External Switcher ready for the BMOW store in a month or two. There are still a few wrinkles to iron out before it’s ready. Because it’s such a tight fit inside, I need to get feedback from some other IIc owners to verify the switcher fits their computers too. I also want to revise the PCB a bit, to make the switcher easier to assemble. And I’d like to provide more meaningful labels for the switch positions than simply “A” and “B”. If there were enough space, I’d label the switch positions something like “normal” and “swapped”, but the adapter is so small that there’s only room for 1 or 2 letters at most. Any great suggestions?

Read 14 comments and join the conversation 

More on Fast Interrupt Handling with Cortex M4

Can a fast microcontroller replace external glue logic, while also continuing to run application code? This is the third in a series of posts considering the question. It’s part of a potential simplification of my Floppy Emu disk emulator hardware, whose present design combines an MCU and a CPLD for glue logic. For readers that haven’t seen the first two parts, you can find them here. Read these first, including the comments discussion after the post body. Go ahead, I’ll wait.

Thoughts on Floppy Emu Redesign
Thoughts on Low Latency Interrupt Handling

There are several pieces of CPLD glue logic that I’m hoping to replace with interrupt handlers on a Cortex M4 microcontroller, specifically the 120 MHz Atmel SAMD51 Cortex M4. The most challenging is a piece of logic that behaves like a 16:1 mux, and must respond within 500 ns to any change on its address inputs. There’s also a write function that behaves a little like a 4-bit latch, as well as some enable logic. I haven’t yet done any real hardware testing, but I’ve spent many hours reading datasheets, writing code, and examining compiler output. I’ll save you the suspense: I don’t think it’s going to work. But it’s close enough to keep it interesting.

 
Coding an Interrupt Handler

A 120 MHz MCU means there are 120 clock cycles per microsecond. To meet the 500 ns (half a microsecond) timing requirement for the mux logic, the MCU needs to do its work in 60 clock cycles. Cortex M4 has a built-in interrupt latency of 12 clock cycles before the interrupt handler begins to run, so that leaves just 48 clock cycles to do the actual work. At best that’s enough time for 48 instructions. In reality it will be fewer than 48, due to pipeline issues, cache misses, branches, flash memory wait states, and the fact that some instructions just inherently take more than one clock cycle. But 48 is the theoretical upper bound.

I spent a while digging through the heavily-abstracted (or should I say obfuscated) code of Atmel Start, the hardware abstraction library provided for the SAMD51. Peeling back the layers of Start, I wrote a minimal interrupt handler that directly manipulates the MCU configuration registers for maximum speed, rather than using the Start API. I ignored the write latch and the enable logic for the moment, and just wrote an interrupt handler for the 16:1 mux function. Bearing in mind this code has never been run on real hardware, here it is:

volatile uint32_t selectedDriveRegister;
volatile uint32_t driveRegisters[16];

void EIC_Handler(void) 
{
	// a shared interrupt handler for changes on five different external pins:

	// EXTINT0 = PA00 = SEL - interrupt on rising or falling edge
	// EXTINT1 = PA01 = PH0 - interrupt on rising or falling edge
	// EXTINT2 = PA02 = PH1 - interrupt on rising or falling edge
	// EXTINT3 = PA03 = PH2 - interrupt on rising or falling edge
	// EXTINT4 = PA04 = PH3 - interrupt on rising edge
	// PA11 = output

	uint32_t flags = EIC->INTFLAG.reg; // a 1 bit means a change was detected on that pin

	// clear EXTINT0-4 flags, if they were set.
	EIC->INTFLAG.reg = (flags & 0x1F); // writing a 1 bit clears the interrupt flags

	// mask the 4 lowest bits and use them as the address of the desired drive register
	selectedDriveRegister = PORT->Group[GPIO_PORTA].IN.reg & 0xF; 

	// don't need to check if drive is enabled. 
	// output enable will be handled externally in a level shifter.
	switch (selectedDriveRegister)
	{
		case 7: 
			// motor tachometer
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose TIMER/COUNTER1 peripheral
			PORT->Group[GPIO_PORTA].PMUX[11>>1].bit.PMUXO = MUX_PA11E_TC1_WO1; 
			break;

		case 8:
			// disk data side 0
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose SERCOM0 peripheral
			PORT->Group[GPIO_PORTA].PMUX[11>>1].bit.PMUXO = MUX_PA11C_SERCOM0_PAD3; 
			// TODO: main loop must check selectedDriveRegister to see if it's 8 or 9 when adding 
			// new bytes to SPI
			break;

		case 9:
			// disk data side 1
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose SERCOM0 peripheral
			PORT->Group[GPIO_PORTA].PMUX[11>>1].bit.PMUXO = MUX_PA11C_SERCOM0_PAD3; 
			// TODO: main loop must check selectedDriveRegister to see if it's 8 or 9 when adding
			// new bytes to SPI
			break;

		default:
			// disk state flags and configuration constants
			// disable peripheral multiplexer selection, return to standard GPIO 
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 0; 
			// set the output pin high or low, according to the register state
			if (driveRegisters[selectedDriveRegister])
				PORT->Group[GPIO_PORTA].OUTSET.reg = (1 << 11);
			else
				PORT->Group[GPIO_PORTA].OUTCLR.reg = (1 << 11);
			// TODO: also change the PA11 output in the main loop, if the selected register
			// changes its value
			break;
	}
}

You'll notice that EXTINT4 (the PH3 signal on the disk interface) isn't actually used in this code, but it will be needed later for the write latch.

The default of the switch statement is about what you'd expect: it uses four of the inputs to construct a 4-bit address, then uses that address to access an array of 16 internal drive registers. Then it sets the output pin high or low, depending on the internal register value.

Addresses 7, 8, and 9 get special handling. These aren't really registers, but are pass-throughs of the drive motor tachometer signal or of the instantaneous read head data from the top or bottom of the disk. They're not static values, but rather are constantly changing streams of data. I plan to implement the tachometer using the timer/counter peripheral, and the read head data using the SPI peripheral. All of these functions share the same pin, PA11. The code must enable and disable the peripheral pin remapping functions as needed.

After finishing this speculative interrupt handler code, I compiled it in Atmel Studio, using gcc with -O2 optimization. Then I viewed the .lss to see what code the compiler generated:

00000c70 <EIC_Handler>:
 c70:	481f      	ldr	r0, [pc, #124]	; (cf0 <EIC_Handler+0x80>)
 c72:	4b20      	ldr	r3, [pc, #128]	; (cf4 <EIC_Handler+0x84>)
 c74:	6942      	ldr	r2, [r0, #20]
 c76:	4920      	ldr	r1, [pc, #128]	; (cf8 <EIC_Handler+0x88>)
 c78:	f002 021f 	and.w	r2, r2, #31
 c7c:	6142      	str	r2, [r0, #20]
 c7e:	6a1a      	ldr	r2, [r3, #32]
 c80:	f002 020f 	and.w	r2, r2, #15
 c84:	600a      	str	r2, [r1, #0]
 c86:	680a      	ldr	r2, [r1, #0]
 c88:	2a08      	cmp	r2, #8
 c8a:	d012      	beq.n	cb2 <EIC_Handler+0x42>
 c8c:	2a09      	cmp	r2, #9
 c8e:	d010      	beq.n	cb2 <EIC_Handler+0x42>
 c90:	2a07      	cmp	r2, #7
 c92:	f893 204b 	ldrb.w	r2, [r3, #75]	; 0x4b
 c96:	d01a      	beq.n	cce <EIC_Handler+0x5e>
 c98:	f36f 0200 	bfc	r2, #0, #1
 c9c:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 ca0:	4816      	ldr	r0, [pc, #88]	; (cfc <EIC_Handler+0x8c>)
 ca2:	680a      	ldr	r2, [r1, #0]
 ca4:	f850 2022 	ldr.w	r2, [r0, r2, lsl #2]
 ca8:	b9ea      	cbnz	r2, ce6 <EIC_Handler+0x76>
 caa:	f44f 6200 	mov.w	r2, #2048	; 0x800
 cae:	615a      	str	r2, [r3, #20]
 cb0:	4770      	bx	lr
 cb2:	f893 204b 	ldrb.w	r2, [r3, #75]	; 0x4b
 cb6:	f042 0201 	orr.w	r2, r2, #1
 cba:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 cbe:	f893 2035 	ldrb.w	r2, [r3, #53]	; 0x35
 cc2:	2102      	movs	r1, #2
 cc4:	f361 1207 	bfi	r2, r1, #4, #4
 cc8:	f883 2035 	strb.w	r2, [r3, #53]	; 0x35
 ccc:	4770      	bx	lr
 cce:	f042 0201 	orr.w	r2, r2, #1
 cd2:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 cd6:	f893 2035 	ldrb.w	r2, [r3, #53]	; 0x35
 cda:	2104      	movs	r1, #4
 cdc:	f361 1207 	bfi	r2, r1, #4, #4
 ce0:	f883 2035 	strb.w	r2, [r3, #53]	; 0x35
 ce4:	4770      	bx	lr
 ce6:	f44f 6200 	mov.w	r2, #2048	; 0x800
 cea:	619a      	str	r2, [r3, #24]
 cec:	4770      	bx	lr
 cee:	bf00      	nop
 cf0:	40002800 	.word	0x40002800
 cf4:	41008000 	.word	0x41008000
 cf8:	2000063c 	.word	0x2000063c
 cfc:	200005f8 	.word	0x200005f8

I don't know much about ARM assembly, but I can count 44 instructions. Already that looks pretty dubious for execution in 48 clock cycles. A couple of cache misses, or multi-cycle branches, or anything else that requires more than one clock per instruction, and the interrupt handler will be too slow to work. And if I attempt to add the missing write latch logic, the code will almost certainly be too slow. Even just an if() test to see whether the write latch was written would probably be too much extra code.

Meanwhile the microcontroller will be running the main application, responding to user input, updating the display, and streaming disk data. Occasionally the main loop will need to do an atomic operation, requiring interrupts to be disabled for a few clock cycles. If an external pin changes state during that time, the interrupt handler will be delayed by a few clock cycles.

The interrupt handler shown above is appropriate for one of the Floppy Emu's many disk emulation modes. In other modes, a different behavior is needed. A real interrupt handler would need some more if() checks at the beginning to perform different actions depending on the current emulation mode. This would add a few clock cycles more.

Even reaching this "almost fast enough" level would require some minor heroics. I'm fairly certain the interrupt handler code would need to be in RAM, not flash, to minimize or eliminate flash wait states. Even RAM might not be enough - it might need to be placed in the special "tightly coupled memory" region. The vector table itself probably also needs to be relocated from flash to RAM or TCM. This should be theoretically possible, but it's the sort of uncommon thing that's often difficult to find good documentation or examples about, and that eats up lots of development time.

To make a long story short - it doesn't look like it's going to work. And even if it did work, it might be such a pain in the ass that it negates any gain I'd get by eliminating the CPLD. And yet it looks pretty close to working, at least within a factor of two if not less. If the timing requirement were 1000 ns instead of 500 ns, I think I could make it work.

 
Other Interrupt Oddities

According to the docs I've read, interrupt handlers on ARM are just like any other function. There's no special interrupt prologue or epilogue, and there's no RTI return from interrupt instruction. And yet gcc does specify an interrupt attribute for ARM functions:

__attribute__ ((interrupt))

The code in Atmel Start doesn't appear to use that attribute for its interrupt handlers. So is it needed or not? What does it do? As best as I can tell, it adds some extra code that aligns the stack pointer upon entry to the interrupt handler, but why? If I add the interrupt attribute to my EIC_Handler(), it gets many instructions longer.

Another unanswered question is how to handle nested interrupts. EIC_Handler wouldn't be the only interrupt handler in the firmware, but it should be the highest priority. If another interrupt handler is running when an external pin changes state, that handler should be pre-empted and EIC_Handler should be started. The Cortex M4 supports nested interrupts, but is there any extra code needed in the interrupt handlers to make it work correctly? Extra registers that must be pushed and popped? I'm not sure, but this discussion suggests the answer is yes. If so, that would add still more instructions to the interrupt handler, making it even slower.

Read 15 comments and join the conversation 

Thoughts on Low Latency Interrupt Handling

How quickly can a modern microcontroller respond to an external interrupt? Is it possible to achieve consistent sub-microsecond response times, so that external glue logic like muxes could be replaced with software instead? That’s the question I raised at the end of my previous post. If it’s possible, then a hypothetical future redesign of the Floppy Emu could be built using a single fast microcontroller, instead of the present design that combines a slower microcontroller and a CPLD for programmable logic.

 
Defining the Challenge

When Floppy Emu is emulating a 3.5 inch floppy drive, the computer controls it using an interface similar to a 16-entry 1-bit memory. Or 16 1-bit registers. The contents of these registers are mostly status flags, like whether a disk is inserted, the disk is write-protected, or the head is at track 0. But some of the “registers” are actually dynamically changing values, like the instantaneous data bit at the current head position of the rotating disk, or the tachometer signal from the disk’s motor rotation.

Here I’ve renamed the actual signal names on the interface to help make things clearer:

A3..A0 – The memory address
R – The memory output bit (when reading memory)
WE – Write-enable

For reading data, whenever the address bits A3..A0 change, the value of R must be updated within 500ns. It’s like a memory with a 500ns access time. Also whenever a status flag changes, or one of the dynamic values changes, R must be updated if A3..A0 already contains the address of the value that changed.

This is exactly the operation of a 16:1 multiplexor.

For writing data, at a positive edge of WE, the register at address A2..A0 must be written with the bit from A3. WE will remain high for 1000ns before it’s deasserted. Given this design, only eight of the sixteen registers are writable.

These timing requirements and the interface details are taken from this spec for the Apple 1.44MB Superdrive controller chip. The Apple 400K/800K drives may have different timing requirements, but I’m assuming they’re the same, or else more forgiving than the 1.44MB drive requirements.

So the challenge is this: the Floppy Emu microcontroller must respond to reads within 500ns, and to writes within a 1000ns write-enable signal window.

 
Choosing the Hardware

There are a bazillion microcontroller options, which is great, but also daunting. Some mcus have features that could make them well-suited to this job, like high clock speeds, dual cores, special peripherals, or programmable logic. The choice is also influenced by my desire for a mainstream mcu, with broad availability, good documentation and community support, good development tools, and a positive long-term outlook. This leads me to eliminate some options like the Parallax Propeller and Cypress PSoC.

For this analysis, I’ll assume the microcontroller is an Atmel SAMD51. If I were actually building this hardware now, that’s what I’d probably choose. The SAMD51 is a fairly new 120 MHz ARM Cortex M4 microcontroller, and is like an upgraded version of the popular SAMD21 used in the Arduino Zero. Adafruit had a gushing review of the SAMD51 when it was released last year. It has a nice selection of hardware peripherals, including some programmable logic, and it’s fairly fast, and cheap.

The SAMD51 is a single-core mcu. As we’ll see, it’s unlikely that a second core would help anyway.

 
SAMD51 Peripherals

An interesting peripheral on the SAMD51 is the Parallel Capture Controller, and it looks perfect for handling writing data. At the edge of an external clock signal (or WE signal in this example), the value on up to 11 other external pins is recorded and stored in a buffer. Then an interrupt is raised, so that software can examine and process the stored value. If necessary, I think it’s also possible to connect the PCC to the DMA controller, so that incoming values are automatically moved to a memory buffer, and there’s no chance of an overrun if the mcu doesn’t process the data quickly enough. This should guarantee that when writing data, no write is ever missed, although the mcu may not necessarily immediately react to the write.

Using the PCC, I think I can check the box for writing data, and assume it will work fine on the SAMD51.

What peripherals might help with reading data? The SAMD51 has an event system, enabling its peripherals to be chained together in custom ways, without any involvement from the CPU core. For example, using the event system, an edge transition on an external pin can trigger an SPI transmission to begin. Or when SPI data is received, it can trigger an external output pin to go low, high, or toggle. It’s very clever, but after looking at the details, I couldn’t see any obvious way to use the event system to handle reading data.

The SAMD51 also has a programmable logic peripheral called the CCL, Configurable Custom Logic. This looks like exactly the right kind of thing to help with reading data, and it is, but there’s simply not enough of it. It’s like an inferior version of one-quarter of a 16v8 PAL. There’s a total of just four LUTs, and each LUT has only three inputs, so it’s quite limited. The linkage between LUTs is also hard-coded, making it difficult to combine multiple LUTs to create more complex functions. The LUT inputs and outputs can be external pins, other LUTs, or certain peripheral ports, but not arbitrary registers or memory locations. In practice I don’t think the CCL can handle reading data for Floppy Emu, although it might help with it in some small way.

After looking at all the hardware peripherals, none of them seem well-suited to handling reading data. The best solution looks like a plain old interrupt. Whenever A3..A0 changes, it will trigger an interrupt, and the interrupt handler code will update R with the new value. Will it be fast enough?

 
Interrupt Handlers

Here’s some pseudocode for the interrupt handlers. First, handling writing data with the PCC:

PCC_Interrupt_Handler()
{
	registerNumber = (PCC_DATA & 0x07); // get A2..A0
	registerData = (PCC_DATA & 0x08) >> 3; // get A3 data bit

	internalState[registerNumber] = registerData;

	// set status flags here to step track, eject disk, etc. 
	// the main loop will do the actual work

	clearInterrupt(PCC);
}

Second, handling reading data with an external pin change interrupt. From my examination of the datasheet, it appears there’s only a single interrupt vector for external interrupts, and the interrupt handler must examine another register to determine which pins actually triggered the interrupt. That means the same handler must not only check the signals described above for reading data, but also other signals that require interrupt handling, like writeRequest (used when the computer writes to the disk) and multiple enable signals (used to select one of several disks that may be present).

bool driveEnabled = false;

EIC_Interrupt_Handler()
{
	if (EIC_INTFLAG & ENABLE_PIN_MASK)
	{
		// enable input has changed
		EIC_INTFLAG &= ~ENABLE_PIN_MASK; // clear interrupt
		driveEnabled = (PIN_STATE & ENABLE_PIN_MASK);
		if (driveEnabled)
			PIN_MODE_OUTPUT_ENABLE |= R_PIN_MASK;
		else
			PIN_MODE_OUTPUT_ENABLE &= ~R_PIN_MASK;
	}

	if (driveEnabled)
	{
		if (EIC_INTFLAG & WRITE_REQUEST_PIN_MASK)
		{
			// writeRequest input has changed
			EIC_INTFLAG &= ~WRITE_REQUEST_PIN_MASK; // clear interrupt
			writeState = (PIN_STATE & WRITE_REQUEST_PIN_MASK);

			// set status flags here to handle beginning and ending
			// of disk sector writes in the main loop
		}

		if (EIC_INTFLAG & ADDR_PINS_MASK)
		{
			// the A3..A0 input pins have changed
			EIC_INTFLAG &= ~ADDR_PINS_MASK; // clear interrupt
			registerNumber = ((PIN_STATE & ADDR_PINS_MASK) >> ADDR_PINS_SHIFT); // get A3..A0

			if (internalState[registerNumber])
				PIN_OUTPUT_VALUE |= R_PIN_MASK; // set R to 1
			else
				PIN_OUTPUT_VALUE &= ~R_PIN_MASK; // set R to 0

			if (registerNumber == INSTANTANEOUS_DISK_DATA_REGISTER)
				PIN_MUX[R_PIN] = PERIPHERAL_SPI;
			else if (registerNumber == MOTOR_TACHOMETER_REGISTER)
				PIN_MUX[R_PIN] = PERIPHERAL_TIMER_COUNTER;
			else
				PIN_MUX[R_PIN] = GPIO;
		}
	}
}

There’s some extra code about enable and write request. For the address, the interrupt handler must also adjust the mcu’s pin mux to control what’s actually driving the output on the R pin. In most cases it’s a GPIO, and the value comes from the internalState[] array and is set in the PIN_OUTPUT register. But for some addresses, the selected value is a dynamically changing quantity that comes from an active SPI peripheral, or a timer/counter peripheral.

 
Interrupt Priority and Pre-emption

EIC_Interrupt_Handler should be given the highest interrupt priority, higher than interrupts for other events like button pushes or SD card data transfers. With a higher priority, I’m fairly certain the EIC_Interrupt_Handler will interrupt any other interrupt handler that might be running at the time. Isn’t that what’s meant by the “nested” part of the ARM’s nested vector interrupt controller?

What about the PCC_Interrupt_Handler, for writing data? Should it have the same priority, or a lower one? Should reads interrupt writes? Can that ever actually happen? Does it matter? I’m not sure.

Can the EIC_Interrupt_Handler interrupt itself? If A0 changes, and EIC_Interrupt_Handler begins to run, and then A1 changes, will the handler be interrupted by a second invocation of the same handler? I think the answer is no. But what probably happens is that the interrupt flag will be set again, and as soon as EIC_Interrupt_Handler finishes, the interrupt will trigger again and EIC_Interrupt_Handler will run again. That seems inefficient, but it’s probably OK.

 
Interrupt Timing

Now we come to the critical question: can EIC_Interrupt_Handler respond to changes on A3..A0 with a new value on R within 500ns?

My research suggests the answer is maybe, but it will be difficult. I found two discussion threads where people were attempting to do something similar with Atmel SAM Cortex M4 and M7 microcontrollers. The first used a 300MHz SAME70, and found a 300ns latency to the start of the interrupt handler. The second used a 120 MHz SAM4E and found a 200ns latency to the start of the handler. These are the delays from the input pin transition to when the interrupt handler begins to run, and they don’t include the actual execution time of the interrupt handler, which is probably several hundred nanoseconds more.

Why so slow? First, the Cortex M4 has a built-in interrupt latency of 12 clock cycles. That’s to do whatever the hardware does for interrupt processing – save the execution state, fetch the interrupt vector, and whatever other voodoo is required. At 120 MHz that’s already 100ns gone.

Then the first instruction of the interrupt handler code must be fetched from internal flash memory. At 120 MHz, the flash isn’t fast enough to supply data in a single clock cycle. It requires 5 wait states, so a read from flash memory needs 6 total clock cycles. That’s another 50ns. So even in the theoretical best-case performance, it will still be a minimum of 150ns before the interrupt handler can begin to run. The two real-world examples I mentioned above were slower.

What about these flash wait states? Does it mean that every instruction in the interrupt handler will need 6 clock cycles to load from flash? I don’t understand the details, but the answer is no. There’s some prefetching and caching happening. Also most instructions are 16 bits wide, and the flash has a 128 bit width, so several instructions can be prefetched and cached at the same time. At least for straight line code with no jumps, I’m guessing that the rest of the interrupt handler can run at speeds approaching 1 instruction per clock cycle at 120 MHz. If anybody knows of good reference data for this, please let me know.

If the flash wait states are a major problem, it may be possible to copy the interrupt handler code to RAM and run it from there. I’m assuming the internal RAM has zero wait states, but I might be wrong on that point.

So 150ns before the interrupt handler can begin to run leaves 350ns remaining. That’s 42 clock cycles at 120MHz. So the interrupt handler can be up to 42 instructions long, on its longest execution path? Not quite, because some common instructions like STR require two clock cycles. Assuming an average time of 1.5 clock cycles per instruction, those 42 clock cycles are only enough for 28 instructions. Can EIC_Interrupt_Handler be implemented in only 28 Thumb assembly instructions? Um… maybe?

 
Complicating Factors

A few other factors raise the difficulty bar further. If the main code ever disables interrupts, or performs any atomic operations, it will delay running of the EIC_Interrupt_Handler and cut further into that 500ns window. In code that uses lots of interrupts, sometimes it’s impossible to avoid needing critical sections where interrupts are briefly disabled, for example to check some value and then set another value based on the first one. Failure to do this can cause rare but serious bugs, if an interrupt intervenes between reading the first value and setting the second.

Another serious complication is the possibility of multiple back-to-back invocations of EIC_Interrupt_Handler. What happens if one of the A3..A0 inputs changes immediately after execution of the line:

registerNumber = ((PIN_STATE & ADDR_PINS_MASK) >> ADDR_PINS_SHIFT); // get A3..A0

The remaining code will output the value of R for the old A3..A0, then the interrupt handler will finish, then a new interrupt will trigger and the handler will be invoked again to process the new A3..A0 input state. The total latency from the change on A3..A0 to the final correct output value of R will be something like 1.5 times the latency for the normal case. In a system where the timing margins are already very tight, that may be enough to break it entirely.

I don’t see any way around this back-to-back invocation problem. Moving EIC_INTFLAG &= ~ADDR_PINS_MASK to the end of EIC_Interrupt_Handler wouldn’t help anything. It would actually clear the pending interrupt flag from the second change of A3..A0 without ever responding to it, resulting in incorrect behavior.

 
Conclusions

So can this work – is software interrupt processing viable with these kinds of timing requirements? Is there some optimization trick I can use in the interrupt handler to improve things? Should I even spend the time to attempt it? Maybe there’s some clever way to use the built-in CCL programmable logic that I’ve overlooked, to help accelerate the interrupt handler or even replace it entirely? Or should I just write off this idea as too difficult and too problematic, and continue using a separate programmable logic chip for a mux and glue logic? Decisions…

Read 15 comments and join the conversation 

« Newer PostsOlder Posts »