BMOW title
Floppy Emu banner

Archive for the 'Bit Bucket' Category

Where Did All The Watts Go?

How efficient is a typical 5V AC-to-DC power supply? I’m digging through my box of assorted power supplies from past projects, looking for something appropriate for a large LED matrix, and noticed the one pictured above. 110V 0.8A input – that’s 88 watts. 5V 4A output – that’s 20 watts. Is the power supply really only 22.7% efficient, or is my reasoning flawed somehow? I would have expected a switching power supply like this one to have much better efficiency than that. A quick search for information on “level IV power supplies” suggests they must be 85%+ efficient, so something doesn’t add up.

Read 12 comments and join the conversation 

Limiting SD Card Inrush Current

I’m experimenting with methods to limit the inrush current when an SD card is inserted, and beginning to wonder whether my solutions are worse than nothing.

When an SD card is inserted into a board that’s already powered on, a large amount of current will flow briefly, as the card’s internal capacitors are charged through its 3.3V supply pin. This is called inrush current. If the inrush current is too large, it can overtax the main board’s voltage regulator and capacitors, causing the board’s supply voltage to drop temporarily. If the voltage drops far enough, it may cause the board’s microcontroller to do a brownout reset. That’s what happened with early versions of the Floppy Emu. It wasn’t really a problem, because you’ll almost always want to perform a reset anyway after inserting a new card, but it was slightly annoying.

In later versions of the Floppy Emu, I added a 1 uH inductor and 10 uF capacitor for the SD card, as shown in the circuit schematic above. Later the capacitor was changed to a 33 uF tantalum. The purpose of the inductor was to limit the inrush current, preventing the main board’s supply voltage from sagging and causing a brownout. And it worked, mostly, as confirmed by observing the main supply and SD card supply voltages on a scope during card insertion. The exact behavior depended on the brand and type of SD card and the card’s internal circuitry. Some types of cards still caused a brownout reset when hot-inserted, but it was rare.

Revisiting this question again recently, I noticed that the inductor created a new issue that may be worse than the one I was trying to solve. When the SD card is inserted, its 3.3V supply pin doesn’t go cleanly from disconnected to connected. Instead it bounces and wiggles over a period of microseconds to milliseconds, just like the contacts of a mechanical switch. As a result, the inrush current isn’t one single burst, but a series of short on/off current pulses. Because of the presence of the inductor in the circuit, these pulses create voltage spikes on the SD card’s 3.3V supply pin. They’re brief – lasting about 100 ns – but some of the spikes go above 4V. Despite their brevity, I’m wondering if they’re high enough to damage the SD card.

Using an inductor seems to be a pretty standard solution for SD card inrush current, but I’ve never seen any discussion of the voltage oscillation and spikes this can cause for the card’s supply. An alternative is a power management IC with “soft start” behavior, but I’m not interested in adding extra chips in this case. I’m starting to think it may be best to remove the inductor, and connect the card’s 3.3V supply directly to the board’s 3.3V supply. Better to cause a nuisance brownout due to high inrush current, than to risk damaging the card with voltage spikes – and still have brownouts sometimes anyway. Have you ever dealt with this topic? How did you address it?

Read 15 comments and join the conversation 

LED PWM Without Resistors

I’ve been working on version 3 of Star Ring, my fun but useless circular LED blinker, and I had an idea. Do I really need current-limiting resistors for the LEDs? Or can I effectively control the LED current using PWM?

Normally a resistor is always needed in series with an LED. Without the resistor, a very high current would flow, destroying the LED. The max DC forward current is specified in the LED datasheet, and is typically something like 30 mA. Often the datasheet will also specify a peak forward current, higher than the DC max, for applications where the LED is rapidly switched on and off with PWM. But how can I calculate what the current will be, if there’s no resistor? As usual, the answer is in the datasheet.

Star Ring is an ATTINY84A microcontroller powered by a 3V CR2032 watch battery. For the sake of this analysis, we’ll assume I’m using this Lite-On dual-color LED with independent red and green elements. The datasheets for the microcontroller and the LED both include graphs of voltage vs current:

The more current that’s drawn from an ATTINY I/O pin, the more the voltage will sag below the 3V supply. And the more current that’s passed through the LED, the more the voltage across it will rise. At some point these two graphs will intersect. You have to exchange the X and Y axes of the LED graph, and extrapolate the ATTINY graph a little, but for the green LED the crossing point is about 2.0V at 25 mA. In the absence of any current-limiting resistor, that’s what you’ll get – theoretically.

25 mA is below the LED’s 30 mA max current rating, so that’s fine. It’s also below the ATTINY’s 40 mA max current per I/O pin, so that’s fine too (although it’s curious why the ATTINY IV graph only goes up to 20 mA). So for this combination of hardware, it looks like current-limiting resistors aren’t needed at all. If 25 mA is more current than is desired, PWM can be used to reduce the average current to something lower.

In reality, the DC current without a resistor will be lower than 25 mA, because of the internal resistance of the CR2032 battery, which I’ve estimated is something like 13 ohms. At 25 mA, even with a fresh 3V battery, the supply voltage to the ATTINY will only be 2.675V due to the battery internal resistance. The I/O pin output voltage will sag further below 2.675V, and those two IV curves will cross at a different point with a current lower than 25 mA. A simple experiment could find the exact number.

Ignoring My Own Advice

Despite this conclusion, I’m probably going to keep the current-limiting resistors, for a few reasons. The graph for the red LED is different than the green, and the no-resistor current will be higher, potentially at an unsafe level. I’m also uncertain whether 25 mA is really OK for the ATTINY – it’s below the 40 mA absolute max rating, but the rest of the datasheet implies 20 mA is the recommended maximum, and there’s no max continuous vs peak current distinction I can see for the I/O pin. I’m also concerned about what happens during debugging or a software crash when my PWM code unexpectedly stops running, and the LED is left constantly on until I can manage to do a reset or kill the power. The resistors will provide some valuable insurance, even if they’re not absolutely required.

Be the first to comment! 

Raspberry Pi GPIO Programming in C

The Raspberry Pi’s 40-pin GPIO connector often gets overlooked. Typical Pi projects use the hardware as a very small desktop PC (RetroPie, Pi-hole, media center, print server, etc), and don’t make any use of general-purpose IO pins. That’s too bad, because with a little bit of work, the Raspberry Pi can make a powerful physical computing device for many applications.

Raspberry Pi vs Arduino (and other microcontrollers)

Why would you want to use a Raspberry Pi instead of an Arduino or other microcontroller (STM32, ATSAM, PIC, Propeller)? There are loads of “Raspberry Pi vs Arduino” articles on the web, and in my view almost all of them miss the mark. The Pi is not a better, more powerful Arduino. It’s a completely different type of device, better at some tasks, but markedly worse at others.

The Pi is vastly more powerful than something like an Arduino Uno. The latest Pi 3 Model B+ has an 88x faster CPU clock and 500,000x more RAM than the Uno. It also runs a full-fledged Linux operating system, so it’s much easier to create projects involving high-level functions like networking or video processing. And you can connect a standard keyboard, mouse, and monitor, and use it as a normal computer.

But the Pi operating system is also a huge weakness in many applications. There’s no “instant on”, because it takes nearly a minute for the device to boot up. There’s no appliance-like shutoff either – the Pi must be cleanly shutdown before power is turned off, or else the operating system files may get corrupted. And real-time bit twiddling of GPIO is mostly impossible, because the kernel may swap out your process at any moment, making precise timing unpredictable.

In theory it’s possible to do bare-metal programming on the Raspberry Pi, eliminating Linux and its related drawbacks for real-time applications. Unfortunately this doesn’t seem to be a common practice, and there’s not much information available about how to do it. So the Pi is probably best for those applications where you need some major CPU horsepower and some kind of GPIO connection to other sensors or equipment, but don’t need precise real-time behavior or microsecond-level accuracy.

GPIO and Python?

If you start Googling for “Raspberry Pi GPIO programming”, you’ll quickly discover that most of the examples use the Python language. In fact, this seems to be the most popular way by far to use GPIO on the Pi.

I have nothing against Python, but I’m a C programmer through and through, and the idea of using a high-level language for low-level digital interfaces is unappealing. By one measure, Python is over 300x slower at Raspberry Pi GPIO manipulation than plain C. I’m sure there are applications where it’s OK to throw away 99.7% of potential performance, but I’ll be sticking with C, thank you very much.

I spent a little time researching four different methods of Raspberry Pi GPIO manipulation in C. This involved reading documentation and data sheets, and examining the source code of various libraries. I haven’t yet tried writing any code using these methods, so take my impressions accordingly.

If any of the authors of these C libraries happen to read this – thank you for your work, and please don’t be offended by any criticisms I may make. I understand that creating an IO library necessarily involves many tradeoffs between simplicity, speed, flexibility, and ease of use, and not everyone will agree on the best path.

Direct Register Control – No Library

The GPIO pins on the Raspberry Pi can be directly accessed from C code, similarly to how it’s done on the ATMEGA or other microcontrollers. A few different memory-mapped control registers are used to configure the pins, and to read input and set output values. The only big difference is that the code must first call mmap() on /dev/mem or /dev/gpiomem, to ask the kernel to map the appropriate region of physical memory into the process’s virtual address space. If that means nothing to you, don’t worry about it. Just copy a couple of dozen lines of code into your program’s startup routine to do the mmap, and the rest is fairly easy.

Here’s an example of reading the current value of GPIO 7:

if (gpio_lev & (1<<7))
  // pin is high
  // pin is low

Just test a bit at a particular memory address - that's it. This looks more-or-less exactly like reading GPIO values on any other microcontroller. gpio_lev is a memory-mapped register whose address was previously determined using the mmap() call during program initialization. See section 6 of the BCM2835 Peripherals Datasheet for details about the GPIO control registers.

Setting the output value of GPIO 7 is similarly easy:

gpio_set |= (1<<7); // sets pin high

gpio_clr |= (1<<7); // sets pin low

Using other control registers, it's possible to enable pull-up and pull-down resistors, turn on special pin functions like SPI, and change the output drive strength.

Watch out for out-of-order memory accesses! The datasheet warns that the system doesn't always return data in order. This requires special precautions and the use of memory barrier instructions. For example:

a_status = *pointer_to_peripheral_a; 
b_status = *pointer_to_peripheral_b;

Without precautions, the values ending up in the variables a_status and b_status can be swapped. If I've understood the datasheet correctly, a similar risk exists for GPIO writes. Although data always arrives in order at a single destination, two different updates to two different peripherals may not be performed in the same order as the code. These out-of-order concerns were enough to discourage me from trying direct register IO with my programs.

Wiring Pi

WiringPi wraps the Raspberry Pi GPIO registers with an API that will look very familiar to Arduino users: digitalRead(pin), digitalWrite(pin, value). It's a C library, but third parties have added wrappers for Python and other high-level languages. From a casual search of the web, it looks like the most popular way to do Raspberry Pi GPIO programming in C.

WiringPi appears to be designed with flexibility in mind, at the expense of raw performance. Here's the implementation of digitalRead():

int digitalRead (int pin)
  char c ;
  struct wiringPiNodeStruct *node = wiringPiNodes ;

  if ((pin & PI_GPIO_MASK) == 0)		// On-Board Pin
    /**/ if (wiringPiMode == WPI_MODE_GPIO_SYS)	// Sys mode
      if (sysFds [pin] == -1)
	return LOW ;

      lseek  (sysFds [pin], 0L, SEEK_SET) ;
      read   (sysFds [pin], &c, 1) ;
      return (c == '0') ? LOW : HIGH ;
    else if (wiringPiMode == WPI_MODE_PINS)
      pin = pinToGpio [pin] ;
    else if (wiringPiMode == WPI_MODE_PHYS)
      pin = physToGpio [pin] ;
    else if (wiringPiMode != WPI_MODE_GPIO)
      return LOW ;

    if ((*(gpio + gpioToGPLEV [pin]) & (1 << (pin & 31))) != 0)
      return HIGH ;
      return LOW ;
    if ((node = wiringPiFindNode (pin)) == NULL)
      return LOW ;
    return node->digitalRead (node, pin) ;

That's a lot of code to accomplish what could be done by testing a bit at an address. To be fair, this code does a lot more, such as an option to access GPIO using sysfs (doesn't require root?) instead of memory-mapped registers, and pin number remapping. It also adds a concept of on-board and off-board pins, so that pins connected to external GPIO expanders can be controlled identically to pins on the Raspberry Pi board itself.

From a brief glance through the source code, I couldn't find any use of memory barriers. I'm not sure if the author determined that they're not necessary somehow, or if out-of-order read/writes are a risk.

WiringPi also includes a command line program called gpio that can be used from scripts (or interactively). It won't be high-performance, but it looks like a great tool for testing, or for when you just need to switch on an LED or something else simple.


pigpio is another GPIO library, and appears more geared towards simplicity and speed. And yes, it was quite a while before I recognized the name was Pi GPIO, and not Pig Pio. 🙂

Here's pigpio's implementation of gpioRead():

#define BANK (gpio>>5)
#define BIT  (1<<(gpio&0x1F))

int gpioRead(unsigned gpio)
   DBG(DBG_USER, "gpio=%d", gpio);


   if (gpio > PI_MAX_GPIO)
      SOFT_ERROR(PI_BAD_GPIO, "bad gpio (%d)", gpio);

   if ((*(gpioReg + GPLEV0 + BANK) & BIT) != 0) return PI_ON;
   else                                         return PI_OFF;

Here there's no pin number remapping or other options. The function does some error checking to ensure the library is initialized and the pin number is valid, but otherwise it's just a direct test of the underlying register.

As with WiringPi, I did not see any use of memory barriers in the source code of pigpio.


bcm2835 is a third option for C programmers looking for a Raspberry Pi GPIO library. It appears to have the most thorough and well-written documentation, but also seems to be the least commonly used library of the three that I examined. This may be a result of its name, which is the name of the SoC used on the Raspberry Pi. It's somewhat difficult to find web discussion about this library, as opposed to the chip with the same name.

Like pigpio, bcm2835 appears more focused on providing a thin and fast interface to the Pi GPIO, without any extra options. Here's the implementation of bcm2835_gpio_lev(), the oddly-named read function:

uint32_t bcm2835_peri_read(volatile uint32_t* paddr)
    uint32_t ret;
    if (debug)
		printf("bcm2835_peri_read  paddr %p\n", (void *) paddr);
		return 0;
       ret = *paddr;
       return ret;

uint8_t bcm2835_gpio_lev(uint8_t pin)
    volatile uint32_t* paddr = bcm2835_gpio + BCM2835_GPLEV0/4 + pin/32;
    uint8_t shift = pin % 32;
    uint32_t value = bcm2835_peri_read(paddr);
    return (value & (1 << shift)) ? HIGH : LOW;

The pin number is constrained to the range 0-31, but otherwise there's no error checking. The actual read of the GPIO register is performed by a helper function that includes memory barriers before and after the read.


For my purposes, I would probably choose pigpio or bcm2835, since I prefer a thin API over one with extra features I don't use. Of those two options, I'd tentatively choose bcm2835 due to the format of its documentation and its use of memory barriers. I wish I understood the out-of-order risk better, so I could evaluate whether the apparent absence of memory barriers in the other libraries is a bug or a feature.

Any analysis that looks at just a single API function is clearly incomplete - if you're planning to do Rasbperry Pi GPIO programming, it's certainly worth a deeper look at the many other capabilities of these three libraries. For example, they differ in their support for handling interrupts, or byte-wide reads and writes, or special functions like SPI and hardware PWM.

Did I miss any other C programming options for Raspberry Pi GPIO, or overlooked something else obvious? Leave a note in the comments.

Read 12 comments and join the conversation 

64 x 32 LED Matrix Programming

Everybody loves a big bright panel of LEDs. Way back in 2011, my daughter and I hand-built an 8 x 8 LED matrix Halloween display. More recently I’ve been playing with a 64 x 32 RGB matrix that I purchased on eBay for $26. Each point in the matrix is 3 independent red, green, and blue LEDs in a single package, so that’s 6144 total LEDs of awesomeness. Wow! Similar but smaller 32 x 32 and 32 x 16 RGB matrixes are also available from vendors like Sparkfun and Adafruit, so LED glory is within easy reach.

But wait, how do you control this thing? Is there a simple “hello world” example somewhere? I couldn’t find one. Adafruit has a nice Arduino library, but it’s much more complex than what I was looking for, so the basic details of controlling the matrix are lost in a sea of other code about PWM modulation, double-buffering, interrupt timing, gamma correction, color space conversion, and more. Great stuff, but not a good “how to” example for beginners. There’s also a Raspberry Pi library by Henner Zeller that’s incredibly rich, supporting video playback and other advanced features. But I really just wanted a couple of dozen lines of code to support a basic setPixel() interface, so I wrote one myself.

How is this LED matrix code different from the Adafruit and Henner Zeller libraries? It’s intentionally primitive, and serves a different purpose, for a different audience:

  • Under 100 lines of code: easy to understand how it works.
  • No hardware dependencies – easily portable to any microcontroller.
  • Basic setPixel() interface. No fonts, lines, etc.
  • 8 basic colors (3-bit RGB) only. No PWM for additional colors.
  • Requires only 1 KB RAM for 64 x 32 matrix.

Controlling the Matrix

All the different varieties of LED matrixes have the same style of 16-pin connector for the control signals:

Larger matrixes like mine will also have a “D” input that replaces one of the ground pins. So what do these signals do?

OE – Active low output enable. Set this high to temporarily turn off all the LEDs.
CLK – Clock signal used for shifting color data into the matrix’s internal shift registers.
LAT – When high, data in the shift registers is copied to the LED output registers.
A, B, C (and maybe D) – Selects one of 8 (or 16) rows in the matrix to illuminate.
[R1, G1, B1] – Color data for the top half of the matrix.
[R2, G2, B2] – Color data for the bottom half of the matrix.

A matrix like this one can’t turn on all its LEDs simultaneously – only one row is on at a time. But by cycling through the rows very quickly using the A/B/C inputs, and changing the color data at the same time as selecting a new row, it can fool the eye into seeing a solid 2D image. There’s no built-in support for this row scanning, so the microcontroller that’s generating the matrix inputs must continuously refresh the matrix, cycling through the rows and updating the color data. Any interruption from another calculation or microcontroller task will cause the LED image to flicker or disappear.

Internally the matrix can be imagined as a big shift register (or actually several shift registers in parallel). The shift register size is equal to the matrix’s width. For a 64-wide matrix, the microcontroller shifts in 64 bits of data, generating a positive CLK edge for each bit, and then strobes LAT to load the shift register contents into the LED output register. The LEDs in the currently-selected row then turn on or off, depending on the data that was loaded.

Because each row contains 64 independent red, green, and blue LEDs, three independent shift registers are required. The serial inputs to these shift registers are R1, B1, and G1.

But wait – there’s one more important detail. Internally, a W x H matrix is organized as two independent matrixes of size W x (H / 2), mounted one above the other. These two sub-panels share the same OE, CLK, and LAT control inputs, but have independent color inputs. R1, G1, B1 are the color inputs for the upper sub-panel, and R2, G2, and B2 are for the lower sub-panel.

Putting everything together, we get this recipe for controlling the matrix:

1. Begin with OE, CLK, and LAT low.
2. Initialize a private row counter N to 0.
3. Set R1,G1,B1 to the desired color for row N, column 0.
4. Set R2,G2,B2 to the desired color for row HEIGHT/2+N, column 0.
5. Set CLK high, then low, to shift in the color bit.
6. Repeat steps 3-5 WIDTH times for the remaining columns.
7. Set OE high to disable the LEDs.
8. Set LAT high, then low, to load the shift register contents into the LED outputs.
9. Set ABC (or ABCD) to the current row number N.
10. Set OE low to re-enable the LEDs.
11. Increment the row counter N.
12. Repeat steps 3-11 HEIGHT/2 times for the remaining rows.

Using this method, each pixel in the matrix is defined by a 3-bit color, supporting the 8 basic colors red, green, blue, cyan, magenta, yellow, black, and white. It’s certainly possible to create more colors by using PWM to alter the duty cycle of the red, green, and blue LEDs, but this quickly becomes complex and too CPU-intensive for your average Arduino. The Adafruit library squeezes out 12-bit 4/4/4 color, but it’s hard-coded for only the Arduino Uno and Mega. I chose to keep my example code simple and skip PWM, so the color palette is limited but it’ll run on any ATMEGA (or really on any generic microcontroller).

This method is appropriate for displaying a static image that was previously drawn into a memory buffer. It won’t work for animated images, because the code for updating the animation would need to be interleaved somehow into the code for refreshing the matrix. If you need animation, you’ll need to perform steps 3-11 inside an interrupt service routine that’s executed at roughly 3200 Hz (or 200 Hz times HEIGHT/2). Anything slower than that, and the human eye will begin to perceive flickering.

Here’s my example code using the above method. It refreshes the matrix as quickly as the CPU allows, and provides a very basic setPixel(x, y, color) API. The code was written for a 16 MHz ATMEGA32U4, but it doesn’t use any ATMEGA-specific hardware features, and should be easily portable to any other microcontroller with at least 2 KB of RAM.


My hardware was an ATMEGA32U4 breakout board, connected to the LED matrix with lots of messy jumper wires (this is Big Mess o’ Wires after all). Fortunately the mess is mostly hidden behind the display, but a custom-made adapter or cable would certainly be cleaner. Here’s a peek at the back side:

I discovered one odd behavior that may be specific to this particular LED matrix: if you don’t cycle through the rows quickly enough, the matrix won’t display anything. It doesn’t work if you set the row number (ABCD inputs) to a fixed value for testing purposes, as I did in my initial tests. It took a huge amount of time for me to discover this, and I haven’t seen this behavior documented anywhere else. From my experiments, if you spend more than about 100 ms without changing the ABCD inputs, the display shuts off. Maybe it’s some kind of safety feature.

Displaying a Bitmap Image

Maybe you’re wondering how I rendered the BMOW logo in the title photo? I certainly didn’t write a bunch of code that calls setPixel() thousands of times – that would be painful. Instead, I used GIMP to export a 64×32 image as a C header file, and then wrote some glue code to parse the data and call setPixel() as needed. Here’s how you can do it too.

First you’ll convert your image to indexed color mode.

1. Open a 64 x 32 image (or the size of your matrix) in GIMP.
2. Select the menu Image->Mode->Indexed.
3. In the conversion dialog, select “Generate Optimum Palette”. Set the maximum number of colors to 256.
4. Click the button “Convert”.

Now you can export the converted image as a C header file.

1. Select the menu File->Export As…
2. In the save dialog, set the export type to “C source code header (*.h)”
3. Click the button “Export”.

Open the generated header file in a text editor. You should see a section that begins with data like:

static char header_data_cmap[256][3] = {
	{  0,  0,  0},
	{244, 38,117},
	{ 28,239,250},

and a second section that begins like:

static char header_data[] = {

For the Arduino or ATMEGA microcontrollers with limited RAM, these declarations must be modified to store the data in program memory (flash ROM) instead of in RAM. Other microcontrollers can skip this step.

For both of the above sections, change static char to const unsigned char, and add PROGMEM just before the = sign:

const unsigned char header_data_cmap[256][3] PROGMEM = { ...

const unsigned char header_data[] PROGMEM = { ...

Then use the function LoadROMImage() to read the bitmap data and set the corresponding pixels in the LED matrix framebuffer. Here’s the updated example code:


The bitmap data is palettized color, where each palette entry is a color with 8 bits of red, 8 bits of green, and 8 bits of blue. LoadROMImage() just uses the most significant bit of each color channel to select the 3-bit color for the LED matrix framebuffer. If you want something fancier, you could try some kind of error-minimization code for selecting the 3-bit color instead. But if you care strongly about attractive-looking colors, then you should probably use Adafruit’s PWM library or something similar, instead of this code.

Footnote: I struggled to get a good-looking photo of the LED matrix. It’s very bright and sharp, with deeply saturated colors. But regardless of what camera settings I tried for shutter speed, ISO, and white balance, I got a muddy desaturated mess like the photo seen above. Judging by the other photos of this matrix that I found on the web, I’m not the only person to find the photography challenging.

If the example code here was helpful, or you built something interesting using it, please leave me a note in the comments. Thanks!

Read 3 comments and join the conversation 

More on Fast Interrupt Handling with Cortex M4

Can a fast microcontroller replace external glue logic, while also continuing to run application code? This is the third in a series of posts considering the question. It’s part of a potential simplification of my Floppy Emu disk emulator hardware, whose present design combines an MCU and a CPLD for glue logic. For readers that haven’t seen the first two parts, you can find them here. Read these first, including the comments discussion after the post body. Go ahead, I’ll wait.

Thoughts on Floppy Emu Redesign
Thoughts on Low Latency Interrupt Handling

There are several pieces of CPLD glue logic that I’m hoping to replace with interrupt handlers on a Cortex M4 microcontroller, specifically the 120 MHz Atmel SAMD51 Cortex M4. The most challenging is a piece of logic that behaves like a 16:1 mux, and must respond within 500 ns to any change on its address inputs. There’s also a write function that behaves a little like a 4-bit latch, as well as some enable logic. I haven’t yet done any real hardware testing, but I’ve spent many hours reading datasheets, writing code, and examining compiler output. I’ll save you the suspense: I don’t think it’s going to work. But it’s close enough to keep it interesting.

Coding an Interrupt Handler

A 120 MHz MCU means there are 120 clock cycles per microsecond. To meet the 500 ns (half a microsecond) timing requirement for the mux logic, the MCU needs to do its work in 60 clock cycles. Cortex M4 has a built-in interrupt latency of 12 clock cycles before the interrupt handler begins to run, so that leaves just 48 clock cycles to do the actual work. At best that’s enough time for 48 instructions. In reality it will be fewer than 48, due to pipeline issues, cache misses, branches, flash memory wait states, and the fact that some instructions just inherently take more than one clock cycle. But 48 is the theoretical upper bound.

I spent a while digging through the heavily-abstracted (or should I say obfuscated) code of Atmel Start, the hardware abstraction library provided for the SAMD51. Peeling back the layers of Start, I wrote a minimal interrupt handler that directly manipulates the MCU configuration registers for maximum speed, rather than using the Start API. I ignored the write latch and the enable logic for the moment, and just wrote an interrupt handler for the 16:1 mux function. Bearing in mind this code has never been run on real hardware, here it is:

volatile uint32_t selectedDriveRegister;
volatile uint32_t driveRegisters[16];

void EIC_Handler(void) 
	// a shared interrupt handler for changes on five different external pins:

	// EXTINT0 = PA00 = SEL - interrupt on rising or falling edge
	// EXTINT1 = PA01 = PH0 - interrupt on rising or falling edge
	// EXTINT2 = PA02 = PH1 - interrupt on rising or falling edge
	// EXTINT3 = PA03 = PH2 - interrupt on rising or falling edge
	// EXTINT4 = PA04 = PH3 - interrupt on rising edge
	// PA11 = output

	uint32_t flags = EIC->INTFLAG.reg; // a 1 bit means a change was detected on that pin

	// clear EXTINT0-4 flags, if they were set.
	EIC->INTFLAG.reg = (flags & 0x1F); // writing a 1 bit clears the interrupt flags

	// mask the 4 lowest bits and use them as the address of the desired drive register
	selectedDriveRegister = PORT->Group[GPIO_PORTA].IN.reg & 0xF; 

	// don't need to check if drive is enabled. 
	// output enable will be handled externally in a level shifter.
	switch (selectedDriveRegister)
		case 7: 
			// motor tachometer
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose TIMER/COUNTER1 peripheral
			PORT->Group[GPIO_PORTA].PMUX[11>>1].bit.PMUXO = MUX_PA11E_TC1_WO1; 

		case 8:
			// disk data side 0
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose SERCOM0 peripheral
			// TODO: main loop must check selectedDriveRegister to see if it's 8 or 9 when adding 
			// new bytes to SPI

		case 9:
			// disk data side 1
			// enable peripheral multiplexer selection
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 1; 
			// choose SERCOM0 peripheral
			// TODO: main loop must check selectedDriveRegister to see if it's 8 or 9 when adding
			// new bytes to SPI

			// disk state flags and configuration constants
			// disable peripheral multiplexer selection, return to standard GPIO 
			PORT->Group[GPIO_PORTA].PINCFG[11].bit.PMUXEN = 0; 
			// set the output pin high or low, according to the register state
			if (driveRegisters[selectedDriveRegister])
				PORT->Group[GPIO_PORTA].OUTSET.reg = (1 << 11);
				PORT->Group[GPIO_PORTA].OUTCLR.reg = (1 << 11);
			// TODO: also change the PA11 output in the main loop, if the selected register
			// changes its value

You'll notice that EXTINT4 (the PH3 signal on the disk interface) isn't actually used in this code, but it will be needed later for the write latch.

The default of the switch statement is about what you'd expect: it uses four of the inputs to construct a 4-bit address, then uses that address to access an array of 16 internal drive registers. Then it sets the output pin high or low, depending on the internal register value.

Addresses 7, 8, and 9 get special handling. These aren't really registers, but are pass-throughs of the drive motor tachometer signal or of the instantaneous read head data from the top or bottom of the disk. They're not static values, but rather are constantly changing streams of data. I plan to implement the tachometer using the timer/counter peripheral, and the read head data using the SPI peripheral. All of these functions share the same pin, PA11. The code must enable and disable the peripheral pin remapping functions as needed.

After finishing this speculative interrupt handler code, I compiled it in Atmel Studio, using gcc with -O2 optimization. Then I viewed the .lss to see what code the compiler generated:

00000c70 <EIC_Handler>:
 c70:	481f      	ldr	r0, [pc, #124]	; (cf0 <EIC_Handler+0x80>)
 c72:	4b20      	ldr	r3, [pc, #128]	; (cf4 <EIC_Handler+0x84>)
 c74:	6942      	ldr	r2, [r0, #20]
 c76:	4920      	ldr	r1, [pc, #128]	; (cf8 <EIC_Handler+0x88>)
 c78:	f002 021f 	and.w	r2, r2, #31
 c7c:	6142      	str	r2, [r0, #20]
 c7e:	6a1a      	ldr	r2, [r3, #32]
 c80:	f002 020f 	and.w	r2, r2, #15
 c84:	600a      	str	r2, [r1, #0]
 c86:	680a      	ldr	r2, [r1, #0]
 c88:	2a08      	cmp	r2, #8
 c8a:	d012      	beq.n	cb2 <EIC_Handler+0x42>
 c8c:	2a09      	cmp	r2, #9
 c8e:	d010      	beq.n	cb2 <EIC_Handler+0x42>
 c90:	2a07      	cmp	r2, #7
 c92:	f893 204b 	ldrb.w	r2, [r3, #75]	; 0x4b
 c96:	d01a      	beq.n	cce <EIC_Handler+0x5e>
 c98:	f36f 0200 	bfc	r2, #0, #1
 c9c:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 ca0:	4816      	ldr	r0, [pc, #88]	; (cfc <EIC_Handler+0x8c>)
 ca2:	680a      	ldr	r2, [r1, #0]
 ca4:	f850 2022 	ldr.w	r2, [r0, r2, lsl #2]
 ca8:	b9ea      	cbnz	r2, ce6 <EIC_Handler+0x76>
 caa:	f44f 6200 	mov.w	r2, #2048	; 0x800
 cae:	615a      	str	r2, [r3, #20]
 cb0:	4770      	bx	lr
 cb2:	f893 204b 	ldrb.w	r2, [r3, #75]	; 0x4b
 cb6:	f042 0201 	orr.w	r2, r2, #1
 cba:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 cbe:	f893 2035 	ldrb.w	r2, [r3, #53]	; 0x35
 cc2:	2102      	movs	r1, #2
 cc4:	f361 1207 	bfi	r2, r1, #4, #4
 cc8:	f883 2035 	strb.w	r2, [r3, #53]	; 0x35
 ccc:	4770      	bx	lr
 cce:	f042 0201 	orr.w	r2, r2, #1
 cd2:	f883 204b 	strb.w	r2, [r3, #75]	; 0x4b
 cd6:	f893 2035 	ldrb.w	r2, [r3, #53]	; 0x35
 cda:	2104      	movs	r1, #4
 cdc:	f361 1207 	bfi	r2, r1, #4, #4
 ce0:	f883 2035 	strb.w	r2, [r3, #53]	; 0x35
 ce4:	4770      	bx	lr
 ce6:	f44f 6200 	mov.w	r2, #2048	; 0x800
 cea:	619a      	str	r2, [r3, #24]
 cec:	4770      	bx	lr
 cee:	bf00      	nop
 cf0:	40002800 	.word	0x40002800
 cf4:	41008000 	.word	0x41008000
 cf8:	2000063c 	.word	0x2000063c
 cfc:	200005f8 	.word	0x200005f8

I don't know much about ARM assembly, but I can count 44 instructions. Already that looks pretty dubious for execution in 48 clock cycles. A couple of cache misses, or multi-cycle branches, or anything else that requires more than one clock per instruction, and the interrupt handler will be too slow to work. And if I attempt to add the missing write latch logic, the code will almost certainly be too slow. Even just an if() test to see whether the write latch was written would probably be too much extra code.

Meanwhile the microcontroller will be running the main application, responding to user input, updating the display, and streaming disk data. Occasionally the main loop will need to do an atomic operation, requiring interrupts to be disabled for a few clock cycles. If an external pin changes state during that time, the interrupt handler will be delayed by a few clock cycles.

The interrupt handler shown above is appropriate for one of the Floppy Emu's many disk emulation modes. In other modes, a different behavior is needed. A real interrupt handler would need some more if() checks at the beginning to perform different actions depending on the current emulation mode. This would add a few clock cycles more.

Even reaching this "almost fast enough" level would require some minor heroics. I'm fairly certain the interrupt handler code would need to be in RAM, not flash, to minimize or eliminate flash wait states. Even RAM might not be enough - it might need to be placed in the special "tightly coupled memory" region. The vector table itself probably also needs to be relocated from flash to RAM or TCM. This should be theoretically possible, but it's the sort of uncommon thing that's often difficult to find good documentation or examples about, and that eats up lots of development time.

To make a long story short - it doesn't look like it's going to work. And even if it did work, it might be such a pain in the ass that it negates any gain I'd get by eliminating the CPLD. And yet it looks pretty close to working, at least within a factor of two if not less. If the timing requirement were 1000 ns instead of 500 ns, I think I could make it work.

Other Interrupt Oddities

According to the docs I've read, interrupt handlers on ARM are just like any other function. There's no special interrupt prologue or epilogue, and there's no RTI return from interrupt instruction. And yet gcc does specify an interrupt attribute for ARM functions:

__attribute__ ((interrupt))

The code in Atmel Start doesn't appear to use that attribute for its interrupt handlers. So is it needed or not? What does it do? As best as I can tell, it adds some extra code that aligns the stack pointer upon entry to the interrupt handler, but why? If I add the interrupt attribute to my EIC_Handler(), it gets many instructions longer.

Another unanswered question is how to handle nested interrupts. EIC_Handler wouldn't be the only interrupt handler in the firmware, but it should be the highest priority. If another interrupt handler is running when an external pin changes state, that handler should be pre-empted and EIC_Handler should be started. The Cortex M4 supports nested interrupts, but is there any extra code needed in the interrupt handlers to make it work correctly? Extra registers that must be pushed and popped? I'm not sure, but this discussion suggests the answer is yes. If so, that would add still more instructions to the interrupt handler, making it even slower.

Read 15 comments and join the conversation 

« Newer PostsOlder Posts »