BMOW title
Floppy Emu banner

Archive for the 'Bit Bucket' Category

Explaining 4K 60Hz Video Through USB-C Hub

USB-C offers exciting new capabilities, including external monitors connected through the USB port. USB-C converters to DisplayPort or HDMI are common and inexpensive. USB-C hubs with external monitor support are also common, but understanding their capabilities and limitations can be extremely confusing. Some are Mac-compatible and some aren’t. Some need driver software. Supported resolutions and refresh rates vary widely. Some are advertised as “not for gaming use”. There are mentions of alt mode and dual mode and more. Prices range from $20 to over $300 for what look like very similar features. What’s going on here?

This is the guide to high-resolution video over USB-C that I wish I’d had. If you’re hoping to connect a high-res external display to your USB-C equipped computer, read on.

 
Forget About HDMI

Lesson 1 is to focus on DisplayPort video connections, and forget about HDMI. You’ll never find a USB-C hub that offers better video capabilities through its HDMI port than through its DisplayPort, but you will find hubs that offer better DisplayPort resolution and refresh rates. I strongly suspect most hubs with an HDMI port are actually implemented internally as a DisplayPort, with an integrated DisplayPort to HDMI converter. This is because DisplayPort video can be carried more efficiently on the USB-C connection than HDMI for the same resolution and refresh rate.

DualMode DisplayPort++ connectors are able to function as HDMI connectors with a simple passive adapter (it does 3.3V to 5V level conversion). Regular DisplayPort connectors can’t do this, and require an active HDMI adapter with more built-in smarts. Otherwise I’m not aware of any difference between these two DisplayPort types.

 
Bandwidth Tradeoffs – It’s All About The Lanes

The 24 pin USB-C connector is the key to understanding. The diagrams below are from techdesignforums.com.

USB-C connectors have four differential pairs called “lanes” for carrying high speed data. There’s also a fifth differential pair D+ and D-, that carries old-style USB 2.0 data.

Let’s look at what happens when DisplayPort is added into the mix:

USB 3.1 Gen 2 only uses two of the four lanes, as shown in the top two rows of this table. The other two lanes are essentially wasted (they will be used by USB 3.2). These two lanes can be repurposed to carry a native DisplayPort signal, using what’s called DisplayPort Alternate Mode, as shown in the middle table rows. In this case the USB-C connector functions like a DisplayPort connector with a different shape and some extra wires for USB data. There’s no loss of USB 3.1 performance. To the computer and the external monitor, this looks exactly like a regular DisplayPort connection.

Two lanes for DisplayPort provide enough bandwidth for one external monitor at up to 4K 30Hz. That’s OK for watching movies, but a 30 Hz Windows or MacOS desktop experience is painful. To keep a 60 Hz refresh rate, you need to step down to 2K or lower resolution.

If you want 4K 60Hz, 5K, or multiple external monitors, then you’ll need to use DisplayPort Alternate Mode with all four lanes for DisplayPort data, as shown in the bottom rows of the table. To the computer and the external monitor, this still looks exactly like a regular DisplayPort connection. But now there are no lanes remaining for USB 3.1 data. There’s only the old D+/D- pair providing slower USB 2.0 data. That means any USB-C hub using this technique for 4K60 video can’t have any USB 3.1 ports on it.

External DisplayPort monitors can also be supported using zero dedicated lanes for DisplayPort Alternate Mode, with one of two approaches. If the computer’s USB-C port has Thunderbolt 3 capability, then DisplayPort data can be encapsulated within the Thunderbolt data stream. The video data becomes just one more type of packetized data multiplexed with everything else. Thunderbolt 3 has enough bandwidth to support multiple 4K60 video connections this way, with enough bandwidth remaining for USB 3.1 data too.

This is great, but Thunderbolt 3 hubs are expensive, and the computer must have Thunderbolt 3 capability, and many computers don’t. This also looks different to the computer – unlike DisplayPort Alternate Mode, there are no native DisplayPort signals and no direct connection to the computer’s GPU. It’s not clear to me whether there’s a performance penalty for treating video this way, or if it’s all handled magically by the chipset with no loss of performance. My hunch is there’s no performance penalty. If you know more, please tell me.

 
DisplayLink

The other method of supporting external monitors with zero dedicated lanes is DisplayLink. This technology compresses the video data on the host side, sends it over a USB 3.1 connection as generic data, and reconverts it to video on the other end using a special chip like the DL-6950. Conceptually it’s like a remote desktop connection for sharing your work computer’s screen when you’re logged in from home, except everything happens locally on your desktop.

DisplayLink is nice for squeezing high-resolution video over a lower-bandwidth connection like USB, or for supporting multiple high-res external monitors without Thunderbolt. But if you have any alternative, I think DisplayLink is best avoided. Here are some disadvantages:

  • Host-side driver software is required. Driver availability and compatibility for Mac/Linux is spotty to non-existent. This is why some USB-C hubs are advertised as not Mac-compatible.
     
  • The driver software can slow your computer. It implements a virtual graphics card performing on-the-fly compression of video data, which adds some CPU overhead.
     
  • When the computer is very busy or there’s a lot of other USB traffic, video artifacts will appear. You’ll see pixelation, stuttering, frame dropouts, and other problems. This is why some USB-C hubs are advertised as “not for gaming use”.

USB-C hubs utilizing DisplayLink work fundamentally differently than the others, but you probably wouldn’t realize that from reading the product descriptions and technical specs on Amazon or Newegg. If you don’t know what you’re looking for, it’s easy to buy a DisplayLink-based hub without realizing it, and suffer its shortcomings unnecessarily.

 
TL;DR – What are the Options?

Putting all this knowledge together, we can group USB-C hubs into four categories based on how they treat video. Here are some examples in each category.

 
4 Lanes for Video

These support external monitors up to 4K60, or possibly 5K, but can only provide USB 2.0 data. That’s not the fastest, but it’s enough for keyboards and mice and basic printers. They should work on any computer that supports DisplayPort Alternate Mode, and typically cost around $30.

Cable Matters 201046 $38 – 1x DisplayPort, power, ethernet, 1x USB2
Cable Matters 201055 $58 – 2x DisplayPort, power, ethernet, 2x USB2
Monoprice 24274 $28 – 1x DisplayPort, power
Cable Matters 201026 $20 – 1x DisplayPort, power
Baseus B07P713FPD $25 – 1x DisplayPort, power

 
2 Lanes for Video

These support external monitors up to 4K30 as well as USB 3.1 data. Many are advertised as simply “4K” without mentioning the refresh rate. They should work on any computer that supports DisplayPort Alternate Mode, and typically cost around $30-$150.

HooToo HT-UC001 $34 – 1x HDMI, 3x USB3, power, card reader
OmniMaster B07KRMRJZD $55 – 1x HDMI, 1x mini DisplayPort, power, ethernet, 2x USB3, card reader, mic
Anker AK-A83310A1 $40 – 1x HDMI, 3x USB3, ethernet
Vava VA-UC006 $45 – 1x HDMI, 3x USB3 Ports, power, ethernet, card reader
StarTech DK30C2DAGPD $114 – 2x DisplayPort (switchable 2 or 4 lanes), power, ethernet, 2x USB2/3

 
0 Lanes for Video – DisplayLink

These support multiple external monitors up to 4K60, or possibly 5K, as well as USB 3.1 data. But they generally are only compatible with Windows computers, not Macs or Linux machines, and they have other performance drawbacks. They cost around $150-$200.

Plugable UD-3900 $89 – includes 1x HDMI, 1x DVI
Plugable UD-ULTC4K $193 – includes 2x DisplayPort, 1x HDMI
Plugable UD-6950H $149 – includes 2x DisplayPort, 2x HDMI
SIIG JUDK0811S1 $199 – includes 2x DisplayPort, 2x HDMI

 
0 Lanes for Video – Thunderbolt 3

These support two external monitors up to 4K60, or possibly 5K, as well as USB 3.1 data. They should work on any computer that has Thunderbolt 3 support. They are the most expensive option, with a typical cost around $250 to $300.

OWC OWCTB3DK12PSG $249 – includes 1x mini DisplayPort, 1x Thunderbolt display
Plugable TBT3-UDV $249 – includes 1x DisplayPort, 1x Thunderbolt display
Cable Matters 107014 $239 – includes 1x HDMI, 1x Thunderbolt display
Kensington SD5200T $239 – includes 1x DisplayPort, 1x Thunderbolt display
Elgato 10DAA4101 $250 – includes 1x DisplayPort, 1x Thunderbolt display
Belkin F4U095tt $296 – includes 1x DisplayPort, 1x Thunderbolt display
CalDigit TS3 $310 – includes 1x DisplayPort, 1x Thunderbolt display

 
The Liars

Finally, we have an interesting category of off-brand USB-C hubs costing around $30 that claim 4K60 video support and USB3.1 data support. Search Amazon and you’ll find quite a few of these. Based on knowledge of USB-C and DisplayPort, we now know this is impossible without using DisplayLink or Thunderbolt 3. These products are all lying about their capabilities! They are very likely DisplayPort Alternate Mode designs using four lanes. They may have blue USB ports labeled “USB 3.1”, but as many of the reviews attest, they only provide USB 2.0 data speeds.

Koopman B07J4XSSXV $24 – 1x HDMI, power, 1x USB
WBPINE HUB3-1 $20 – 1x HDMI, power, 1x USB
Koopman B07M5DMYKY $32 – 1x HDMI, power, 3x USB
NEWPOWER B07PQ5GZK1 $30 – 1x HDMI, power, 3x USB

What’s been your experience with external monitors connected by USB-C? Leave a note in the comments.

 

Read 8 comments and join the conversation 

Take a Tour of BMOW Labs

They say you can tell a lot about a person by looking at his work-space. I thought it would be fun to take a break from stuffing boxes today, and make a short video tour of the BMOW Lair. I’ve mentioned before that it’s not a large space – just a single room about 150 square feet / 14 square meters. All BMOW engineering development, order fulfillment, and storage is crammed into this one room. It used to be a home study, but BMOW projects and supplies have slowly taken over and there’s barely any free floorspace left. So come on, take a look inside…

Be the first to comment! 

Part Selection and Schmitt Trigger Oscillator

I often obsess over little details of my circuit designs, and the daisy-chain adapter for Floppy Emu is no exception. The design needs a small CPLD for the daisy-chaining logic, and for various reasons I have narrowed the choices to the Lattice ispMACH LC4032ZE and LC4032V. These are both 32 macrocell CPLDs, and are very similar except for a few details:

LC4032ZE – 48 pins, 0.5 mm pin pitch, 1.8V core, built-in oscillator

LC4032V – 44 pins, 0.8 mm pin pitch, 3.3V core

The 4032ZE is the newer of the two options, and the 4032ZE supply at distributors is a bit more plentiful. It also has a built-in 5 MHz RC oscillator with +/- 30% accuracy, which can be divided down to the kHz range or lower frequencies without using any macrocells. As it happens, the daisy chain adapter needs a clock source in the kHz range for periodic tasks, but the exact frequency isn’t too important, so this is perfect.

The drawbacks of the 4032ZE are its core voltage and its pin pitch. With a 1.8V core serving 3.3V I/O to and from 5V disk drives, I’d need to design a three-voltage system. In practice that means an additional voltage regulator, some extra decoupling capacitors, and a bit more headache with the board layout. 0.5 mm pin pitch means the pins are very tightly spaced. It creates a greater likelihood of soldering errors and hard-to-see solder shorts during assembly. Basically it will make assembly and testing of boards more challenging.

The 4032V looks like a good alternative, with a 3.3V core and a much wider pin pitch. But it lacks any built-in oscillator. If I want a clock source, even an inaccurate one, I’ll have to provide one externally. That will add a bit to the board cost and complexity. The 4032V itself is also slightly more expensive than its twin. In the end, it’s not obvious to me whether the 4032V or 4032ZE is the better choice overall.

Which one would you choose?

 
Schmitt Trigger Oscillator

If I choose the 4032V, then I’ll be looking for a simple and inexpensive way to provide an external clock signal to it. Something around 10 kHz would be preferred. I can probably tolerate inaccuracies in the frequency of 50% or more, over time on the same board or between different boards.

I could use a single chip oscillator like a MEMS oscillator, but I’m drawn instead to the idea of a Schmitt Trigger RC oscillator. It’s cheaper, and it also has a nice old-school vibe. The circuit is simply a single inverter with its output fed back to its input through a resistor, and with its input also connected through a capacitor to ground.

The frequency of the Schmitt Trigger RC oscillator depends on the values of the capacitor and resistor, the hysteresis of the inverter, and the supply voltage. Calculators exist to help predict the frequency, but in practice I’d probably need to tune it experimentally.

I’m fine with some variation in the frequency, as long as it doesn’t vary wildly. A variance of 2x or more could become problematic. Given the tolerance of the capacitor and resistor values, temperature-dependent capacitance changes, process variations between inverters, and possible supply voltage fluctuations, what range of frequency variation is a Schmitt Trigger RC oscillator likely to experience?

Would I be better off paying more for a standard oscillator, even though I don’t need the high accuracy? Or would I be better off using the 4032ZE with its built-in oscillator, and not stressing about the core voltage and pin pitch?

Read 7 comments and join the conversation 

The Demon Razor that Wouldn’t Turn Off

What do you do when a battery-powered appliance won’t turn off? And when it’s a sealed unit, so removing the batteries is impossible? And when its body starts to grow disturbingly warm? That’s the situation I found myself in a few days ago.

 
Riddles in the Dark

I was working at home one night, and gradually became aware of a strange buzzing sound. Initially I thought the sound was outside, but when I went to investigate, I discovered it was coming from the bathroom. My skull shaver, plugged in and recharging, had mysteriously turned itself on and the blades were spinning away. Pressing the on/off button had no effect. Unplugging the charging cable had no effect. The body is a single piece of molded plastic, so there was no non-destructive way of opening it. Nothing could stop the whirrrrrrrrrr of the blades, and the shaver was noticeably warm.

I started to panic that the razor would explode. The internal battery is likely lithium polymer, and from my days with RC cars and aircraft I know that defective or damaged LiPos can fail catastrophically. Like literally go boom and eject flaming molten goo everywhere that burns down your house.

I quickly took the razor outside, and set it on the concrete patio, blades whirring this whole time. A couple of minutes later, I began to fear that it was still too close to the house if it exploded, so I moved it to the street. Thankfully it didn’t explode, and those blades kept whirring for 90 minutes, during which two people stopped to ask what the horrible noise was.

 
A Tale of Two Chargers

So what caused the skull shaver to go crazy? Bad charging. Besides this manly pink skull shaver, I also own a more conventional Norelco cordless shaver. I’d never noticed it before, but the chargers for the two shavers have the same plug at the end of their cords:

A quick check confirmed that yes, I’d accidentally plugged the skull shaver into the Norelco charger. Is that bad? You might think that the plug shape is standardized, and that all charger plugs with this shape are designed for the same voltage. Let’s check. Here’s the skull shaver charger, which is nicely labeled. 5V output, max 1000 mA:

And here’s the Norelco charger. Instead of a label, its specs are molded into the charger body using impossible to read tiny-sized black-on-black lettering. Yuck.

But if it’s tilted at just the right angle to the light, and you get your reading glasses, here’s what emerges:

15 volts! Ouch! I charged a 5 volt device with a 15 volt charger.

I’m suddenly nostalgic for the days when real on/off switches physically disconnected the power. Many of today’s electronic appliances have a soft on/off switch that’s really just an input to some controller circuitry. When soft switches work, they’re great. But when something goes wrong with the control circuit you suddenly have a zombie appliance that can’t be shut off. In the case of this razor, the 15 volts apparently killed the control circuitry before the LiPo battery could be damaged to the point of explosion by over-charging. And the failure mode of the control circuitry was to fail ON.

Have you ever made a similar charging mistake, or exploded a battery through mistreatment? Leave a comment below and tell your story!

Be the first to comment! 

Halloween LED Matrix and PSU Death

In preparation for Halloween this year, I built a large and colorful LED matrix. After programming it with monster-themed animations, it looked fantastic! Months passed, and when October 31st finally arrived, at dusk I hung the LED display outside by the street. It was to be the perfect lure for neighborhood kids.

But when I checked back an hour later, the LED display was dark and dead. Halloween passed in sad form, with no display of animated monsters. What happened?

 
LED Matrix ‘Hello World’

This all started last May, when I bought a generic 64 x 32 LED matrix from eBay. These matrix displays are designed to be controlled by an Arduino, Raspberry Pi, or other microcontroller or FPGA. The smallest displays are the simplest: the control unit selects one row of LEDs to illuminate, and then sends a stream of 0’s and 1’s to turn off or on the individual LEDs in that row. In effect, each row of the display is like a large shift register, with each bit corresponding to a separate LED. By cycling rapidly through all the rows, and providing different data for each row, the control unit can create the appearance of the whole LED matrix being lit. For larger displays, two rows can be illuminated at once, using two separate streams of bits, but otherwise the interface is the same.

A blog post from May describes my experiences with that first 64 x 32 LED matrix, using custom software I wrote for an Arduino. It worked well, and the matrix was impressively bright and colorful – the photos can’t do it justice. My custom software was limited to displaying static images, with only 8 colors, because the red, green, and blue LEDs were simply on or off with no in-between state.

 
Matrix Upgrade

If one matrix is good, two matrixes must be better! I bought a second identical matrix and connected it to the first. These matrixes are designed to be daisy-chained, with the shift-out from one matrix connected to the shift-in of the next. Logically this results in rows that are twice as long as before, creating a 128 x 32 matrix. But physically I arranged the displays to create a 64 x 64 matrix. As a result, the control software became more complicated with mappings between physical and logical lines.

I quickly abandoned my custom Arduino solution, and adopted the excellent rpi-rgb-led-matrix Raspberry Pi library by Henner Zeller. It’s incredibly rich, supporting many different physical to logical mappings, thousands of colors using PWM, video playback, and many other advanced features. Really, if you’re experimenting with one of these LED matrixes, this is the software you want.

For the Raspberry Pi, I selected a Zero W thanks to its built-in WiFi, small size, and rock-bottom price of $10. The OS is a default Raspbian image configured to run in terminal mode. It’s easy to connect to the Pi over WiFi using ssh, and then use rpi-rgb-led-matrix command-line utilities to display images on the LED matrix. It’s a powerful solution, and the only downsides compared to the Arduino are the few seconds required for the Pi to boot up, and the need to perform a clean shutdown instead of just pulling the power plug.

It’s possible to connect the LED matrix directly to the Pi’s GPIO pins, if you don’t mind a squid-like mass of wires. I chose an easier route and bought the Adafruit RGB Matrix Bonnet, which has the same footprint as the Pi Zero and makes the LED matrix connections a breeze. I performed a simple mod to the Adafruit bonnet in order enable hardware PWM to reduce flickering, as described further here. After that it was just plug and play, using the --led-gpio-mapping=adafruit-hat-pwm command-line switch for the rpi-rgb-led-matrix software.

Using the advanced search tools from Google Images, I looked for 64 x 64 animated GIFs with monster-related keywords. In short order I was able to locate several dozen. I was in business!

 
Building the Frame

To mount the two LED matrix panels together and create an eye-catching display, I built a custom frame. The design was loosely based on this Instructable by Al Linke. It uses several layers of laser-cut acrylic with a pile of carefully-sized spacers and machine screws. The frame took a substantial amount of work, but the end result looks great.

First the LED panels were mounted on a black acrylic piece, with pre-cut holes for mounting screws, wiring, and the Raspberry Pi. The rear of that piece was an ugly and bumpy tangle of cables that wouldn’t hang flat against the wall, so a second black acrylic piece was mounted behind the first to contain the mess. This piece also has integrated mounting holes for a wall hook or picture hanging wire. A third semi-frosted acrylic panel was then mounted on top, to give the LEDs a more diffused look. This is matter of taste, but I found that the LED images looked much nicer with the diffuser panel than without.

 
Powering the LEDs

In a 64 x 64 matrix, there are 4096 elements. Each element contains separate red, green, and blue LEDs, so the grand total is 12288 LEDs. Assuming that each LED draws 15 mA (a typical number for a single discrete LED), naive math calculates the total current as a whopping 184 amps! Ouch! But this calculation overlooks the fact that only a few rows are actually illuminated at the same time. This particular matrix uses 1:16 multiplexing, so the maximum current is a much more manageable 11.5 amps.

Armed with this information, I purchased the 5V 10A power supply shown here. Why only 10A instead of 11.5A or more? Enclosed “brick” supplies that can provide more than 10 amps are difficult to find, and I expected I’d never need 11.5 amps anyway for real-world images. 11.5A is a worst-case number for a solid white image where every LED is on. My halloween monster images are much darker, typically with many black pixels, so the required current should be much less.

 
Failure Analysis

When the LED display died on Halloween night, the Raspberry Pi was unresponsive to WiFi connections and the LEDs were dark. The power indicator on the Pi was blinking on and off. Later I noticed that the blue power indicator on the power supply brick was also blinking on and off. Strange.

I brought everything inside and plugged it back in, with the same result. Nothing worked, and the power indicators blinked. At first I thought there must be a short-circuit somewhere in the LED matrix or the Adafruit bonnet, which was repeatedly tripping some protection circuitry in the power supply. But when I disconnected the Pi and the LED matrix, and tried the power supply alone, I observed the same blinking power indicator. The problem was clearly with the power supply itself. Did I exceed its maximum current rating and kill it?

Confused, I left the hardware unplugged for a few days. Later, when I plugged it in again to begin more troubleshooting, I was surprised that it worked! I reconnected the LED display, and everything was great for about an hour. Then it died a second time, with the same blinking power indicator symptom. Uplugging and replugging didn’t help. But if I left it unplugged for a few hours, it would work again the next time it was plugged in.

By this point I was fairly sure I must be over-taxing the power supply and drawing too much current. I guessed that the supply must contain a thermal fuse, and it was overheating and shutting down. Only after a few hours of cooling would it work again, for a short while. 10 amps was simply not enough, it seemed.

To solve my power needs I purchased this 5V 30A “cage” type supply. I was reluctant, because this type of supply isn’t meant for outdoor use, and because it requires wiring to screw terminals instead of using standard power plugs. I’m a reasonably careful person, but I still get nervous playing with bare wires that carry mains voltages. Unfortunately I didn’t see a good alternative.

Before connecting the 30A supply, I decided to do one more test to see how much current the 10A supply was really using. I don’t have an easy way to directly measure the DC current, so I used a Kill-a-Watt meter to measure the power delivered from the wall outlet to the supply. I cycled the LED matrix through its collection of monster images several times, and the highest power measured by the Kill-a-Watt was 19 watts. Hmmm. If I assume the power supply is 80% efficient, then that means it was supplying about 15 watts to the LED matrix. It’s a 5V supply, so that’s a current of only 3 amps maximum – far below the supply’s claimed max of 10 amps. So why did the supply keep shutting off after an hour of use?

While still connected to the Kill-a-Watt, I let the hardware run for a while. This time it took several hours before it shut down, but the end result was the same as before, with dark LEDs and a blinking power indicator. The Kill-a-Watt showed 1 watt. The power supply didn’t feel hot to the touch, but was only slightly warm. This seemed to rule out my “overheating” theory, but didn’t suggest anything else.

Because I’d measured the max current at only about 3 watts, I decided to try a new approach. I pulled out another brick-type power supply from a different project, this one rated at 5 volts and 4 amps max. I connected it to the LED display, and started everything running. It worked just fine, and 18 hours later it’s still running smoothly. Success! At least for this series of Halloween images, it’s all I need.

I’m still curious what caused the first power supply to fail the way it did. Clearly it’s defective or broken somehow, but I’d like to understand more. A 10 amp power supply shouldn’t have any trouble delivering 3 amps continuously. And if it were actually something like a 2 amp power supply, mislabeled as 10 amp, I would expect it to get obviously hot after extended use. But it was never more than slightly warm. Could there be another explanation?

Read 5 comments and join the conversation 

Cortex M4 Interrupt Speed Test

How quickly can a microcontroller detect and respond to changing inputs? Fast enough to replace a dedicated combinatorial logic chip, like a mux? I finally have some test results to begin answering this question.

My goal here is a potential redesign of the Floppy Emu disk emulator. The current design uses a microcontroller for the high-level logic, and a CPLD for the timing-critical stuff. But if a new microcontroller were fast enough to handle the high-level logic and the timing-critical stuff, I could simplify the design and eliminate the CPLD.

This is the fourth post in a series:

1. Thoughts on Floppy Emu Redesign
2. Thoughts on Low Latency Interrupt Handling
3. More on Fast Interrupt Handling with Cortex M4

 
Background

Let’s consider a mux-like function performed by Floppy Emu’s CPLD, as part of some disk emulation modes. It behaves like a 16-to-1 mux: 16 data inputs, 4 address inputs, and 1 data output. In order to properly emulate a disk drive, the mux must respond to changing address or data inputs within 500 nanoseconds. For my tests, I used an ARM Cortex M4 running at 120 MHz: specifically the Atmel SAMD51 on an Adafruit Metro M4 Express board.

At 120 MHz, 500 nanoseconds is 60 clock cycles: that’s how much time is available between an input’s rising/falling edge and the updated data output. The previous posts in this series examined the datasheets and performed some static code analysis, attempting to decide whether this was realistically possible in 60 clock cycles. The answer was “maybe”, awaiting some real-world timing experiments.

 
Timing It

Here’s a very simple interrupt handler. It doesn’t even attempt to perform the 16-to-1 mux function yet. It sets an output pin high when the handler begins running, and low when it finishes running, so I can monitor the timing with a logic analyzer. The body of the interrupt handler clears the interrupt flags for external interrupts 1, 2, 3, and 6 (where I connected the address inputs). This establishes a lower bound on how fast the real interrupt handler could possibly be, once I’ve added the 16-to-1 mux functionality and many other related pieces of logic.

void EIC_1236_Handler(void) 
{
	PORT->Group[GPIO_PORTA].OUTSET.reg = 1 << 2; // PA2 on

	uint32_t flagsSet = EIC->INTFLAG.reg; // which EIC interrupt flags are set?
	flagsSet &= 0x4E;  // we only act on EIC 6, 3, 2, and 1
	EIC->INTFLAG.reg = flagsSet; // writing a 1 bit clears the interrupt flags.
	
	// now do something, based on which flags were set. More flags may get set in the meantime...

	PORT->Group[GPIO_PORTA].OUTCLR.reg = 1 << 2; // PA2 off
}

Here are the results from the logic analyzer. The inputs PH2, PH1, PH0, and SEL are from a Macintosh Plus querying to test whether a disk drive is present. ISR is the timing output signal from my interrupt handler.

Every time there's an edge on one of the input signals, there's a brief spike on ISR. Looks good. Let's zoom in:

For the highlighted input edge, the delay between a rising edge of PH1 and the start of the interrupt handler is 175 nanoseconds (0.175 µs). Other edges are similar, but not identical. For this sample, the delays ranged between 175 and 250 ns. The width of the ISR pulse (the duration of the interrupt handler) was either 50 or 75 ns. So the total time needed to detect an input edge and run a minimal interrupt handler function is about 225 to 325 ns. That only leaves a few hundred nanoseconds to do the actual work of the interrupt handler, which doesn't seem promising. (The precision of the timing measurements was 25 ns.)

Test conditions:

  • NVRAM line cache was enabled
  • NVRAM wait states set to "auto"
  • L1 instruction/data cache was enabled (it's about 1.6x slower when disabled)
  • edge detection filtering and debouncing were off (these add latency)
  • edge detection was configured for asynchronous (fastest)
  • the main loop never disables interrupts
  • SAMD51 main clock was definitely 120 MHz (confirmed with a scope)

This result is moderately worse than predicted by my static analysis of code and datasheets. Through further tests, I also found that code in my interrupt handler averaged close to 2 clocks per instruction, not the 1 clock per instruction that I'd hoped. That makes sense, because apparently any Cortex M4 instruction that references memory requires a minimum of two clock cycles. When I began writing the 16-to-1 mux code, the duration of the interrupt handler quickly approached 500 ns all by itself, without even considering the delay from input edge to start of the interrupt handler.

 
What Next?

Given these results, I'm almost ready to give up on this idea, and return to the tried-and-true CPLD-based solution. I say "almost", because I haven't yet written the full 16-to-1 mux functionality and other related logic, and because there are still a few more tricks I could try:

  • relocate the interrupt vector table from NVRAM to RAM
  • relocate the interrupt handler itself from NVRAM to RAM
  • overclock the SAMD51 or try a different microcontroller
  • profile various Macs and Apple IIs, to see if there's any slack in the 500 ns nominal requirement

But at this point, my intuition says this is not the right path. The whole idea of moving timing-critical logic from a CPLD to the microcontroller was to simplify things. There's no reason I must do this - it's just an option. So is it really simplifying things if I need to throw every optimization trick in the book at this problem, just to barely maybe meet the 500 ns timing requirement with no room to spare? What happens when I discover some future bug or requirement that needs a few extra instructions in the interrupt handler, and now it's pushed over 500 ns? Is it really worth abandoning all the time I've spent getting familiar with Atmel's SAM hardware and tools, in order to try some other vendor's part that goes to 150 MHz or 180 MHz? Probably not.

Relying on both a CPLD and a microcontroller surely has some drawbacks: a two-part firmware design, larger board, and slightly higher cost. But it also has a huge benefit: it's a much surer path to getting something that works. I've already done it with the existing Floppy Emu design, and I could make incremental improvements by keeping the same basic approach, but replacing the current CPLD and microcontroller with newer alternatives. I'll stew on this for a while more, but that's where it feels like this is headed, and I'm OK with it.

Read 10 comments and join the conversation 

« Newer PostsOlder Posts »