BMOW title
Floppy Emu banner

Archive for the 'Bit Bucket' Category

Star Ring: Abusing the PCB Fab

Why use a PCB silkscreen when you could showcase the shiny gold metal layer? Why build a rectangular board when you could have strange and wonderful shapes? And why settle for a dull LED flasher when you could have something crazy? I went a little bit overboard with this one, and I’m unsure how to even describe it. It’s a wearable LED blinky, with some terrible (or amazing?) abuse of the PCB fabrication process, many eye-catching light displays, and careful attention to power usage to ensure long battery life. For lack of any better name, I’ll call it the Star Ring.

 
PCB

The PCB was made at OSH Park. Yes, the fab really will cut PCBs in this shape, and with a large hole in the middle too. Whatever is drawn as the outline layer in the design software, that’s what they’ll cut, so I won’t limit myself to boring rectangles. In this case the board is a 2-inch diameter ring, where a 1.5-inch diameter disc has seemingly been cut out of the center and repositioned behind the ring and overlapping it. In Eagle, it’s actually a many-sided polygon rather than a true circle, but the difference isn’t visible.

Second, the artwork. Rather than using the silkscreen to draw stars, moons, and planets, I put them into the top metal layer. Because this PCB was made with ENIG plating, the artwork appears as shiny gold. It’s a unique look, and it’s also extremely crisp and high resolution, much higher than I would get with the silkscreen layer. Some of those stars are only 0.4 millimeters across, but they still show up clearly when viewed under a 10x magnifying lens:

 
Electronics

The board’s primary electronics are about as simple as you can get: just a microcontroller, a pushbutton, and some LEDs. Even including the CR2032 battery, current-limiting resistors, and a couple of capacitors, it’s a bare minimum of components. The microcontroller is an ATTINY84A, and is hidden on the back:

The ATTINY microcontrollers are like the little brothers of the better-known ATMEGA parts found in the Arduino and Floppy Emu. This was my first time using an ATTINY chip, and I was excited to give it a try. Aside from having less RAM and less flash memory than ATMEGA parts, I was hard-pressed to spot any difference. Those who scour the datasheet will discover that the ATTINY chips have fewer built-in hardware peripherals, or the peripherals have fewer features than their ATMEGA counterparts, but for most people the differences aren’t important. The great thing about ATTINY chips is that they come in small packages and are very inexpensive. This particular chip is only 80 cents in single-unit quantities.

 
Lighting LEDs with PWM

Each of the nine LEDs is connected to an ATTINY output pin, and to an 82 ohm current-limiting resistor. They’re amber LEDs with a forward voltage of 2.0 volts, according to the datasheet. With a 3.0 volt battery, the math says that will create a 1.0 volt potential difference across an 82 ohm resistor, resulting in a current of 12.2 mA. Unfortunately the math is completely wrong.

The CR2032 battery has significant internal resistance of about 15 ohms, which further limits the current. And the ATTINY output pin voltage will droop lower than the supply voltage when it’s delivering many milliamps of current, so the voltage applied to the LED will be something less than 3.0 V. The more LEDs that are lit simultaneously, the more noticeable this effect will become. I attempted to do some complex math and experiments with a dozen different resistor values, in order to find the optimum value, before I concluded that it didn’t really matter. Anything in the 60 to 120 ohm range is probably fine. With the 82 ohm resistors, my tests showed an LED current of 7.9 mA with a single LED lit, and 6.0 mA each with two LEDs lit.

Lighting up all nine LEDs is more challenging. It’s not possible to directly power all nine at the same time, because that would draw more current than the battery and ATTINY can provide. The LEDs would get noticeably dim, and the supply voltage would get pulled down, possibly to a voltage low enough to cause a microcontroller malfunction or reset. Star Ring creates the appearance that all nine LEDs are lit by turning them on and off very quickly, with no more than three LEDs ever turned on at the same instant.

With the necessity of modulating the LED duty cycle for power reasons, it was only a short step further to a full PWM control for each LED. This made it possible to change the brightness of the LEDs dynamically, creating a pleasing “analog” look that contrasts with the typical full-on/full-off illumination of typical LED displays. Lighting an LED with 2% duty cycle looks dim, lighting it with 80% duty cycle looks bright.

The ATTINY has two hardware peripherals that can handle this type of PWM, but because Star Ring has nine LEDs, I had to design a software PWM solution instead. With some passably optimized code, and the ATTINY running at 4 MHz, the 9-channel PWM LEDs blink at 1148 Hz. This is fast enough to be mostly invisible to the human eye. The software PWM supports 16 brightness levels, and each brightness level maps to a duty cycle between 0:64 (off) and 64:64 (100% on).

While developing the PWM code, I discovered something interesting about the human eye and brain. I assumed that an LED with 80% duty cycle looks twice as bright as one with 40% duty cycle, but I was wrong. In fact, I’m hard-pressed to notice any visible difference in brightness, and at most I will say the 80% looks slightly brighter. Because we humans have evolved to cope with vast brightness differences in our environment, from the dazzling noon sun to the faintest starlight, there’s a decidedly non-linear mapping between the energy output of a light source and its perceived intensity. There are various formulas that attempt to convert between the two, but I just created a conversion table and then tweaked the numbers until it looked good. The result is that an LED set to 50% brightness doesn’t get a 50% duty cycle of 32:64, but only 15:64.

Side note: I couldn’t use green or blue LEDs here, because their forward voltage is about 3.0 volts, the same as my battery voltage. In this circuit, green/blue LEDs would be very dim at best. With a 3.0 volt battery I’m effectively limited to using yellow or red LEDs.

 
Blinky Functions

Even with the help of PWM, it’s not possible to animate the LEDs all the time without exhausting the battery in a matter of hours. Instead, Star Ring implements 12 different LED animation patterns that play periodically, with each pattern being about two seconds long. The patterns span the range of my LED-blinking creativity: a spinning wheel that gradually slows (shown in the title image), a flickering candle, fireworks, stars that slowly fade in and out, and many others. Pressing the button wakes up the device and plays the next animation pattern. Once awake, the device will also spontaneously play a new pattern about once per minute. Once in a rare while, it will play a special longer pattern. If you’re a 9-year-old kid, this is like the equivalent of “rare” Pokemon cards, and will keep you engaged with the Star Ring for long periods just to get that payoff.

 
Wearable?

My vision for Star Ring is to make it a wearable device, either as a shirt/hat pin, a necklace, or something else. I haven’t yet figured out the best attachment method for clothing, so for the moment I’m hanging it off a shirt pocket zipper. If anybody has a great idea on how I could incorporate a pin or a snap into the PCB design, please let me know.

The periodic but infrequent animation patterns are intended to support the wearable design. Even if the battery allowed it, a constantly-animating LED display would quickly grow annoying and get switched off, or else would be tuned out and ignored. But when Star Ring periodically flares to life, it always grabs attention. “What that?” people will ask. Then as I stand talking to them, they’ll interrupt “oh it did it again!” It naturally draws people in, and is a great little accessory for the kinds of events that welcome a blinking PCB with stars and moons (whatever those might be).

 
Power Consumption and Battery Life

The CR2032 battery provides a paltry 220 mAh at 3.0 volts. That’s not much. The 220 mAh capacity also assumes a current draw of only 0.1 mA. If the circuit draws current at a higher rate, the effective battery capacity will be even less. But with a bit of experimentation, I was able to design the hardware and firmware to get a projected battery life of 1 year. Not bad!

While an LED animation pattern is playing, the current from the battery varies between 10 to 24 mA depending on the pattern. Clearly the Star Ring can’t afford to do that very often. With no LEDs illuminated but the ATTINY still running, the current is 2.9 mA, which is still far too high. To save power, in the time between each LED animation pattern the ATTINY slows its clock to 31 kHz, disables all hardware peripherals except the timer, and enters idle mode. In this mode the main CPU clock is halted, but the peripheral clock continues to run, so the device can be awakened from a timer interrupt or external pin change interrupt. The current from the battery in this state is only 99 microamps. Much better! According to the datasheet it should be even lower, about 10 microamps, but I’ll take what I can get.

But even 99 microamps is too much current for the long term. That rate of current would deplete the battery in three months all by itself, without ever illuminating the LEDs. To save more power, if an hour has passed without the button being pressed, the ATTINY will enter power-down mode. In this mode all clocks are halted, and the IO buffers are disabled except for the one pin connected to the button. The device can’t be awakened by a timer interrupt, but now requires an external pin change interrupt from the button. When in this state, the current from the battery is a minuscule 120 nanoamps. Yet this tiny amount of current is still enough to maintain the CPU state and the contents of RAM. As soon as the button is pushed, Star Ring immediately jumps back to life and continues with the next LED animation pattern.

Sometimes the operator may want to force Star Ring into power-down mode immediately, without waiting for the one-hour timeout. Pressing and holding the button for a few seconds will play a “shutdown” LED animation, followed by the device going immediately to power-down mode. The battery life in this state is effectively infinite, or as long as the shelf life of the battery. No further LED animations will play until the device is reawakened by another button press.

 
Next Steps

I don’t think this is a product for sale – unless 20 people immediately respond saying they want to buy a Star Ring. It would be challenging to mass produce, because the intentional misuse of the top copper layer for decorative purposes means it would be difficult to use a solder stencil and an oven to assemble the boards. Solder paste would stick to the exposed copper of the moons and stars, spoiling the design. So this is probably just a project of personal whimsy for myself, friends, and family.

Commercial or personal, I do have a couple of ideas for Star Ring version 2:

  • Put the battery on the back, and the ATTINY chip on the front. This would look better, since the battery is kind of ugly.
     
  • Add mounting holes, clips, snaps, or something else to provide for easy use as a wearable.
     
  • Use a right-angle pushbutton instead of a standard button. With a clothing mount, the standard button may be painful or socially awkward to push in, against the body. A right-angle button would push parallel to the body.
     
  • Use red/yellow dual-color LEDs, that are really two LEDs in a single package. It might require an ATTINY with more pins, or perhaps a couple of extra transistors, but it would enable many more creative possibilities for the LED animation patterns.

Happy blinking!

Read 10 comments and join the conversation 

Thoughts on Low Latency Interrupt Handling

How quickly can a modern microcontroller respond to an external interrupt? Is it possible to achieve consistent sub-microsecond response times, so that external glue logic like muxes could be replaced with software instead? That’s the question I raised at the end of my previous post. If it’s possible, then a hypothetical future redesign of the Floppy Emu could be built using a single fast microcontroller, instead of the present design that combines a slower microcontroller and a CPLD for programmable logic.

 
Defining the Challenge

When Floppy Emu is emulating a 3.5 inch floppy drive, the computer controls it using an interface similar to a 16-entry 1-bit memory. Or 16 1-bit registers. The contents of these registers are mostly status flags, like whether a disk is inserted, the disk is write-protected, or the head is at track 0. But some of the “registers” are actually dynamically changing values, like the instantaneous data bit at the current head position of the rotating disk, or the tachometer signal from the disk’s motor rotation.

Here I’ve renamed the actual signal names on the interface to help make things clearer:

A3..A0 – The memory address
R – The memory output bit (when reading memory)
WE – Write-enable

For reading data, whenever the address bits A3..A0 change, the value of R must be updated within 500ns. It’s like a memory with a 500ns access time. Also whenever a status flag changes, or one of the dynamic values changes, R must be updated if A3..A0 already contains the address of the value that changed.

This is exactly the operation of a 16:1 multiplexor.

For writing data, at a positive edge of WE, the register at address A2..A0 must be written with the bit from A3. WE will remain high for 1000ns before it’s deasserted. Given this design, only eight of the sixteen registers are writable.

These timing requirements and the interface details are taken from this spec for the Apple 1.44MB Superdrive controller chip. The Apple 400K/800K drives may have different timing requirements, but I’m assuming they’re the same, or else more forgiving than the 1.44MB drive requirements.

So the challenge is this: the Floppy Emu microcontroller must respond to reads within 500ns, and to writes within a 1000ns write-enable signal window.

 
Choosing the Hardware

There are a bazillion microcontroller options, which is great, but also daunting. Some mcus have features that could make them well-suited to this job, like high clock speeds, dual cores, special peripherals, or programmable logic. The choice is also influenced by my desire for a mainstream mcu, with broad availability, good documentation and community support, good development tools, and a positive long-term outlook. This leads me to eliminate some options like the Parallax Propeller and Cypress PSoC.

For this analysis, I’ll assume the microcontroller is an Atmel SAMD51. If I were actually building this hardware now, that’s what I’d probably choose. The SAMD51 is a fairly new 120 MHz ARM Cortex M4 microcontroller, and is like an upgraded version of the popular SAMD21 used in the Arduino Zero. Adafruit had a gushing review of the SAMD51 when it was released last year. It has a nice selection of hardware peripherals, including some programmable logic, and it’s fairly fast, and cheap.

The SAMD51 is a single-core mcu. As we’ll see, it’s unlikely that a second core would help anyway.

 
SAMD51 Peripherals

An interesting peripheral on the SAMD51 is the Parallel Capture Controller, and it looks perfect for handling writing data. At the edge of an external clock signal (or WE signal in this example), the value on up to 11 other external pins is recorded and stored in a buffer. Then an interrupt is raised, so that software can examine and process the stored value. If necessary, I think it’s also possible to connect the PCC to the DMA controller, so that incoming values are automatically moved to a memory buffer, and there’s no chance of an overrun if the mcu doesn’t process the data quickly enough. This should guarantee that when writing data, no write is ever missed, although the mcu may not necessarily immediately react to the write.

Using the PCC, I think I can check the box for writing data, and assume it will work fine on the SAMD51.

What peripherals might help with reading data? The SAMD51 has an event system, enabling its peripherals to be chained together in custom ways, without any involvement from the CPU core. For example, using the event system, an edge transition on an external pin can trigger an SPI transmission to begin. Or when SPI data is received, it can trigger an external output pin to go low, high, or toggle. It’s very clever, but after looking at the details, I couldn’t see any obvious way to use the event system to handle reading data.

The SAMD51 also has a programmable logic peripheral called the CCL, Configurable Custom Logic. This looks like exactly the right kind of thing to help with reading data, and it is, but there’s simply not enough of it. It’s like an inferior version of one-quarter of a 16v8 PAL. There’s a total of just four LUTs, and each LUT has only three inputs, so it’s quite limited. The linkage between LUTs is also hard-coded, making it difficult to combine multiple LUTs to create more complex functions. The LUT inputs and outputs can be external pins, other LUTs, or certain peripheral ports, but not arbitrary registers or memory locations. In practice I don’t think the CCL can handle reading data for Floppy Emu, although it might help with it in some small way.

After looking at all the hardware peripherals, none of them seem well-suited to handling reading data. The best solution looks like a plain old interrupt. Whenever A3..A0 changes, it will trigger an interrupt, and the interrupt handler code will update R with the new value. Will it be fast enough?

 
Interrupt Handlers

Here’s some pseudocode for the interrupt handlers. First, handling writing data with the PCC:

PCC_Interrupt_Handler()
{
	registerNumber = (PCC_DATA & 0x07); // get A2..A0
	registerData = (PCC_DATA & 0x08) >> 3; // get A3 data bit

	internalState[registerNumber] = registerData;

	// set status flags here to step track, eject disk, etc. 
	// the main loop will do the actual work

	clearInterrupt(PCC);
}

Second, handling reading data with an external pin change interrupt. From my examination of the datasheet, it appears there’s only a single interrupt vector for external interrupts, and the interrupt handler must examine another register to determine which pins actually triggered the interrupt. That means the same handler must not only check the signals described above for reading data, but also other signals that require interrupt handling, like writeRequest (used when the computer writes to the disk) and multiple enable signals (used to select one of several disks that may be present).

bool driveEnabled = false;

EIC_Interrupt_Handler()
{
	if (EIC_INTFLAG & ENABLE_PIN_MASK)
	{
		// enable input has changed
		EIC_INTFLAG &= ~ENABLE_PIN_MASK; // clear interrupt
		driveEnabled = (PIN_STATE & ENABLE_PIN_MASK);
		if (driveEnabled)
			PIN_MODE_OUTPUT_ENABLE |= R_PIN_MASK;
		else
			PIN_MODE_OUTPUT_ENABLE &= ~R_PIN_MASK;
	}

	if (driveEnabled)
	{
		if (EIC_INTFLAG & WRITE_REQUEST_PIN_MASK)
		{
			// writeRequest input has changed
			EIC_INTFLAG &= ~WRITE_REQUEST_PIN_MASK; // clear interrupt
			writeState = (PIN_STATE & WRITE_REQUEST_PIN_MASK);

			// set status flags here to handle beginning and ending
			// of disk sector writes in the main loop
		}

		if (EIC_INTFLAG & ADDR_PINS_MASK)
		{
			// the A3..A0 input pins have changed
			EIC_INTFLAG &= ~ADDR_PINS_MASK; // clear interrupt
			registerNumber = ((PIN_STATE & ADDR_PINS_MASK) >> ADDR_PINS_SHIFT); // get A3..A0

			if (internalState[registerNumber])
				PIN_OUTPUT_VALUE |= R_PIN_MASK; // set R to 1
			else
				PIN_OUTPUT_VALUE &= ~R_PIN_MASK; // set R to 0

			if (registerNumber == INSTANTANEOUS_DISK_DATA_REGISTER)
				PIN_MUX[R_PIN] = PERIPHERAL_SPI;
			else if (registerNumber == MOTOR_TACHOMETER_REGISTER)
				PIN_MUX[R_PIN] = PERIPHERAL_TIMER_COUNTER;
			else
				PIN_MUX[R_PIN] = GPIO;
		}
	}
}

There’s some extra code about enable and write request. For the address, the interrupt handler must also adjust the mcu’s pin mux to control what’s actually driving the output on the R pin. In most cases it’s a GPIO, and the value comes from the internalState[] array and is set in the PIN_OUTPUT register. But for some addresses, the selected value is a dynamically changing quantity that comes from an active SPI peripheral, or a timer/counter peripheral.

 
Interrupt Priority and Pre-emption

EIC_Interrupt_Handler should be given the highest interrupt priority, higher than interrupts for other events like button pushes or SD card data transfers. With a higher priority, I’m fairly certain the EIC_Interrupt_Handler will interrupt any other interrupt handler that might be running at the time. Isn’t that what’s meant by the “nested” part of the ARM’s nested vector interrupt controller?

What about the PCC_Interrupt_Handler, for writing data? Should it have the same priority, or a lower one? Should reads interrupt writes? Can that ever actually happen? Does it matter? I’m not sure.

Can the EIC_Interrupt_Handler interrupt itself? If A0 changes, and EIC_Interrupt_Handler begins to run, and then A1 changes, will the handler be interrupted by a second invocation of the same handler? I think the answer is no. But what probably happens is that the interrupt flag will be set again, and as soon as EIC_Interrupt_Handler finishes, the interrupt will trigger again and EIC_Interrupt_Handler will run again. That seems inefficient, but it’s probably OK.

 
Interrupt Timing

Now we come to the critical question: can EIC_Interrupt_Handler respond to changes on A3..A0 with a new value on R within 500ns?

My research suggests the answer is maybe, but it will be difficult. I found two discussion threads where people were attempting to do something similar with Atmel SAM Cortex M4 and M7 microcontrollers. The first used a 300MHz SAME70, and found a 300ns latency to the start of the interrupt handler. The second used a 120 MHz SAM4E and found a 200ns latency to the start of the handler. These are the delays from the input pin transition to when the interrupt handler begins to run, and they don’t include the actual execution time of the interrupt handler, which is probably several hundred nanoseconds more.

Why so slow? First, the Cortex M4 has a built-in interrupt latency of 12 clock cycles. That’s to do whatever the hardware does for interrupt processing – save the execution state, fetch the interrupt vector, and whatever other voodoo is required. At 120 MHz that’s already 100ns gone.

Then the first instruction of the interrupt handler code must be fetched from internal flash memory. At 120 MHz, the flash isn’t fast enough to supply data in a single clock cycle. It requires 5 wait states, so a read from flash memory needs 6 total clock cycles. That’s another 50ns. So even in the theoretical best-case performance, it will still be a minimum of 150ns before the interrupt handler can begin to run. The two real-world examples I mentioned above were slower.

What about these flash wait states? Does it mean that every instruction in the interrupt handler will need 6 clock cycles to load from flash? I don’t understand the details, but the answer is no. There’s some prefetching and caching happening. Also most instructions are 16 bits wide, and the flash has a 128 bit width, so several instructions can be prefetched and cached at the same time. At least for straight line code with no jumps, I’m guessing that the rest of the interrupt handler can run at speeds approaching 1 instruction per clock cycle at 120 MHz. If anybody knows of good reference data for this, please let me know.

If the flash wait states are a major problem, it may be possible to copy the interrupt handler code to RAM and run it from there. I’m assuming the internal RAM has zero wait states, but I might be wrong on that point.

So 150ns before the interrupt handler can begin to run leaves 350ns remaining. That’s 42 clock cycles at 120MHz. So the interrupt handler can be up to 42 instructions long, on its longest execution path? Not quite, because some common instructions like STR require two clock cycles. Assuming an average time of 1.5 clock cycles per instruction, those 42 clock cycles are only enough for 28 instructions. Can EIC_Interrupt_Handler be implemented in only 28 Thumb assembly instructions? Um… maybe?

 
Complicating Factors

A few other factors raise the difficulty bar further. If the main code ever disables interrupts, or performs any atomic operations, it will delay running of the EIC_Interrupt_Handler and cut further into that 500ns window. In code that uses lots of interrupts, sometimes it’s impossible to avoid needing critical sections where interrupts are briefly disabled, for example to check some value and then set another value based on the first one. Failure to do this can cause rare but serious bugs, if an interrupt intervenes between reading the first value and setting the second.

Another serious complication is the possibility of multiple back-to-back invocations of EIC_Interrupt_Handler. What happens if one of the A3..A0 inputs changes immediately after execution of the line:

registerNumber = ((PIN_STATE & ADDR_PINS_MASK) >> ADDR_PINS_SHIFT); // get A3..A0

The remaining code will output the value of R for the old A3..A0, then the interrupt handler will finish, then a new interrupt will trigger and the handler will be invoked again to process the new A3..A0 input state. The total latency from the change on A3..A0 to the final correct output value of R will be something like 1.5 times the latency for the normal case. In a system where the timing margins are already very tight, that may be enough to break it entirely.

I don’t see any way around this back-to-back invocation problem. Moving EIC_INTFLAG &= ~ADDR_PINS_MASK to the end of EIC_Interrupt_Handler wouldn’t help anything. It would actually clear the pending interrupt flag from the second change of A3..A0 without ever responding to it, resulting in incorrect behavior.

 
Conclusions

So can this work – is software interrupt processing viable with these kinds of timing requirements? Is there some optimization trick I can use in the interrupt handler to improve things? Should I even spend the time to attempt it? Maybe there’s some clever way to use the built-in CCL programmable logic that I’ve overlooked, to help accelerate the interrupt handler or even replace it entirely? Or should I just write off this idea as too difficult and too problematic, and continue using a separate programmable logic chip for a mux and glue logic? Decisions…

Read 10 comments and join the conversation 

What’s on the BMOW Bookshelf

Over the decades, I’ve pared my collection of books down to just four shelves – only about 100 volumes from a lifetime of reading and studying. Because who needs physical books in an age where everything is available online? The few books that remain in my collection have survived a relentless process of repeated culling, until only the most meaningful and valuable ones remain. Want to take a peek? Here are some selections from the shelves.

Computation Structures, by Ward and Halstead. This was my original digital electronics textbook, covering everything from the transistor to CPU design, and is the only one of my bazillion university textbooks to have survived the cull. This book and MIT’s 6.004 class are what first got me excited about building with electronics. The book is a bit dated now: for something more up-to-date, maybe try Horowitz and Hill instead.

At the end of Computation Structures are schematics and control software for a DIY 8-bit computer called the MAYBE, from which I liberally borrowed ideas for my BMOW 1 computer. I built the MAYBE on a giant breadboard in a suitcase, chip by chip and wire by wire, over a semester-long class.

Practical Electronics for Inventors, by Scherz and Monk. This is a great reference for all those electronics details I should have learned, but didn’t. It puts an emphasis on practical engineering applications, with just enough theory to make sense of it all. When I find myself staring at some Forrest Mims schematic or other inscrutable circuit full of bipolar transistors and passives with unknown purpose, this book is a helpful deciphering aid.

The Black Art of Video Game Console Design, by André LaMothe. Ignore the slightly misleading title. This book is a great how-to guide for designing and building digital electronics projects of all types, with a particular emphasis on microcontroller projects. It covers analog and digital theory, circuit analysis, prototyping techniques, computer architecture, video synthesis, relevant tools, and more. If I could only have a single reference book for all my BMOW projects, this would probably be it.

FPGA Prototyping with Verilog Examples, by Pong P. Chu. For anyone that’s interested in building things with FPGAs or CPLDs, but finds it hard to get past the phase of blinking an LED, check out this book. Verilog can be a difficult language for someone coming from procedural languages like C++. It looks superficially similar, but is actually radically different, with every line of “code” running in parallel. This book goes beyond the Hello World stuff, and provides well-explained examples like a digital stopwatch, a soft UART, PS/2 keyboard IO, and a memory controller. There’s also a VHDL version of this book by the same author, if you prefer that to Verilog.

Macintosh Repair and Upgrade Secrets, by Larry Pina. This book is the bible for vintage Mac collectors. It’s long out of print, but I managed to snag an old copy from eBay. Does you Mac Plus make a “flupping” sound and refuse to turn on? Got an original Mac 128K with floppy drive problems, or video that’s reduced to a single horizontal line? Larry explains how to fix it, pointing to exactly which components need testing and replacement. These were the good old days when hackers could fix their busted motherboard by replacing transistor Q5 with a Radio Shack soldering iron, instead of making an appointment at the Genius Bar.

The New Apple II User’s Guide, by David Finnigan. Unlike Macintosh Repair and Upgrade Secrets, Finnigan’s Apple II bible was published in 2012 and is therefore relatively up to date. It covers all the common repairs for the Apple II family, as well as lots of reference material for collectors who may not have grown up with an Apple II, or forgotten what CALL -151 does. What’s better, it also discusses many of the challenges and options for using an Apple II computer in the 21st century. These include how to get your Apple II on the internet, solid state floppy disk alternatives, archiving tools, and the like.

Web Database Applications with PHP and MySQL. This book is essentially “how to build an interactive web site for dummies” using 2004 technology, so it’s dated now, but I still refer to it occasionally. Even for someone with no plans to build a web site, it’s a good introduction to the concepts of how web sites work under the hood. The book describes the ever-popular LAMP stack (Linux, Apache, MySQL, PHP) used to build the front-end and back-end of web applications. Using the skills learned from this book, augmented with more online learning, I’ve built about a dozen different web sites over the years.

Reversing: Secrets of Reverse Engineering, by Eldad Eilam. This is a fascinating look at how software programs are put together, and how they can be pulled apart and modified. I’m not aware of any other books like this one, which makes it that much more entertaining. Reverse engineering sometimes gets a bad reputation, and some people believe it’s only relevant for defeating copy-protection or other morally questionable purposes. The author introduces other uses, such as dissecting and analyzing malware, or understanding software that lacks source code and documentation. For anyone who enjoyed my post about what happens before main(), or my quest to create the smallest possible Windows executable, I recommend checking out this book.

Compute! Magazine, January 1985. This is the only remaining example of my once-mighty collection of computer magazines. There’s no special significance to this particular issue, other than that it was the only one to escape the recycling bin. It’s always amusing to browse. Loading software from cassette tape? Compuserve ads? And who didn’t love typing in those 20-page long BASIC program listings?

Flatland, by Edwin A. Abbott. This classic 1884 science fiction “romance of many dimensions” will blow your mind in just 92 pages. Imagine you’re a three-dimensional being attempting to explain things to inhabitants of Flatland, a 2D world. Imagine those Flatlanders attempting to explain their world to the miserable inhabitants of 1D Lineland. Now ponder the implications for our own three-dimensional existence: is it any more real than Flatland or Lineland? Where are the 4D visitors to Earth, and what might they look like? It’s a fantastic thought experiment, and the side-plot satire of Victorian society values is a hilarious bonus.

The Monopoly Companion, by Philip Orbanes. If you thought Monopoly was just a silly game for kids, you’re wrong. Back in my university days, my group of friends played many hundreds of Monopoly rounds, arguing strenuously over trading and strategy. A good friend was the Massachusetts champion one year, and represented the state at the National Monopoly Championships held in Atlantic City. The book describes general strategies like a preference for certain color groups (orange is best), and the importance of building to the three house level. We created a detailed software simulation accounting for all board squares, rents, cards, and dice roll probabilities, and ran it millions of times to determine the best strategy. Then we sent a letter to this book’s author, disputing some of his conclusions. God, what a bunch of nerds we were.

Your Money or Your Life, by Dominguez and Robin. We all spend a tremendous amount of time and energy in the attempt to accumulate money, sometimes without really considering why. Yes we need a safe place to live, food to eat, and other necessities – but what then? The money itself won’t bring happiness. Have we really thought about exactly what will bring happiness? How much of our life’s energy are we willing to trade away for those things? Having identified them, might there be other ways to get similar things that require less or no money? Can money even buy them at all? Your Money or Your Life helps guide readers towards a future where they may not be rich in dollars, but are rich in the things they value most.

Finding Your Own North Star, by Martha Beck. Where are we headed? Why? What’s important to our essential self? Like other self-help books or the time-honored What Color is Your Parachute, this book aims to help identify paths that are best-aligned with one’s values. It resonated with me in a way that other similar books didn’t. One clever exercise was to get everybody on your side, by intentionally redefining who “everyone” is. Another was an invitation to envision the best possible personal future imaginable, and map it out. The act of defining a goal already brings it closer.

The Goldfinch, by Donna Tartt. A friend and I read this Pulitzer Prize winning novel at the same time. She thought it was just OK. I thought it was outstanding. It was one of those books that when you finish, you set it gently in your lap and just stare vacantly into space for a long while, trying to digest it all. Tartt’s prose is rich and evocative like a master painting, and her story of loss and disaffection is heartbreaking. But what really moved me was the portrayal of the awkward intimacy of adolescent male friendships, the things said and unsaid but understood, the struggle to carry those relationships into adulthood. I felt Tartt had snooped inside my mind and extracted feelings I didn’t even know were there.

A family bible from 1817. Not many artifacts survived the past two centuries with my family, but this one did. It’s a huge volume with an imposing leather cover, and on the inside leaf are inscribed the births and deaths of generations.

“Joshua Ladd was born 1st mo. 7 1817”. That’s January 7 to us, so Joshua just had his 201st birthday recently. He was my great-great-great-great-grandfather, a farmer originally from Virginia who at age 14 moved to Ohio with his mother and siblings after the death of their father. Ohio had just become a state, and the family moved in search of new opportunities. Joshua opened a grocery and supply store in the town of Westville, near Damascus OH. I still have the wedding gloves of Joshua’s daughter Sarah tucked safely in a box. Any Ladds in your family tree? Maybe we’re related.

Essays by Ralph Waldo Emerson. This particular volume is from 1883, and was passed down through generations to my father and then me. Sometimes I think those Transcendentalist authors like Emerson, Longfellow, and Thoreau had everything figured out in 1870, and we haven’t accomplished much since. “When you were born you were crying and everyone else was smiling. Live your life so at the end, you’re the one who is smiling and everyone else is crying.”

Read 3 comments and join the conversation 

Meltdown and Spectre Vulnerabilities Explained

This week brought some fascinating news for CPU nerds: the revelation of security vulnerabilities in the basic hardware architecture of many modern processors. The Meltdown and Spectre vulnerabilities affect virtually all modern computers, according to a sensationalist headline at The Register which called them the “worst ever” CPU bugs. Unlike bugs in a specific software program or operating system that can be fixed with a patch, these are fundamental flaws in the very design of the CPU.

What are these vulnerabilities exactly, how do they work, and how are Meltdown and Spectre different from each other? I must have read twenty different news stories without finding a clear answer. The best I could get was that these vulnerabilities somehow involve the CPU’s use of speculative execution: an optimization trick that executes code before it’s known whether the code should really be executed, discarding the results if it’s later determined that the code wasn’t needed. But as a guy who designs custom CPUs as a hobby, I needed a better answer than that. So I started poking beyond the news headlines into the gory tech details.

What follows below is my attempt to explain Meltdown and Spectre to myself, and by extension to readers of this blog. It assumes readers already have some basic knowledge of concepts like CPU internals, caches, and operating systems. But moreso than normal for this blog, I’ll be discussing details that I don’t fully understand myself, and my explanations may be flawed. If you find an error or omission, kindly post a comment and let me know.

 
Speculative Execution + Caching = Vulnerability

Both Meltdown and Spectre exploit the fact that speculatively executed code can modify the CPU cache. Even if the code is never “really” executed, meaning that it never modifies CPU registers or memory or other processor state, the cache effects of the speculatively executed code can be observed by cleverly-constructed code that follows it, by testing what is and isn’t cached. Consider this example, taken from Google’s Project Zero Blog:

struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...;
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  ...
}

If arr1->length is not presently in the cache, the CPU won’t immediately know whether the if clause will evaluate true or false. If it guesses true, then the if body will execute speculatively while arr1->length is fetched from main memory. Speculative execution will cause arr1->data[untrusted_offset_from_caller] to be loaded from main memory into the cache. This behavior can be leveraged in several different ways (see below) to gain information about protected regions of memory that are supposed to be private: memory owned by other processes or the kernel. The ability to observe protected memory makes it possible to read passwords, Bitcoin keys, emails, or other sensitive information.

 
Spectre: Speculative Execution in Branch Prediction

The Spectre vulnerability requires getting the victim (the kernel or another process) to run specially-constructed code, which then leaks information through the cache effects of speculative execution. Consider an expanded version of the previous example.

struct array {
struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  unsigned long index2 = ((value&1)*0x100)+0x200;
  if (index2 < arr2->length) {
    unsigned char value2 = arr2->data[index2];
  }
}

If arr1->length is not currently in the cache, speculative execution will continue inside the body of the if clause. Either arr2->data[0x200] or arr2->data[0x300] will be fetched from main memory and cached, depending on the least significant bit of arr1->data[untrusted_offset_from_caller]. After the speculative execution has ended, the attacking user mode code can measure how much time is required to load arr2->data[0x200] and arr2->data[0x300]. Whichever one was cached will load faster, revealing whether the LSB of arr1->data[untrusted_offset_from_caller] is 0 or 1. By repeating this process with other bit masks, the attacker can eventually read all of arr1->data[untrusted_offset_from_caller]. And by the choice of untrusted_offset_from_caller, the attacker can read any memory location.

That’s the general idea. Some implementation details and optimization methods are described in the Project Zero blog. The blog also describes another Spectre variant exploiting speculative execution through indirect branches, which I didn’t examine.

How can a user process get the kernel to run this kind of specially-constructed code? It turns out that the Linux kernel has a feature called eBPF that’s designed for this exact purpose, presumably to allow for device drivers or socket filters or other snippets of user-provided code that need to run in the kernel. I’m assuming Windows and Mac OS have something similar. Since running arbitrary user-provided code in the kernel would be a huge security vulnerability itself, eBPF actually runs the code in an interpreter or a JIT engine. But that’s enough to exploit this vulnerability.

Note: after writing this post, I found a second explanation of Spectre from a group of academic researchers working independently from Google Project Zero. Their paper describes Spectre slightly differently, and discusses attacking other processes rather than the kernel. It also includes a proof of concept Javascript attack, in which a malicious bit of Javascript is able to read private memory from the web browser process. Relying on the fact that Javascript is typically JIT compiled to native code, and using debug tools to examine the native output, the authors were able to iteratively tweak the Javascript source code until it produced native code containing an exploitable conditional branch of the kind shown above.

Spectre affects essentially all modern CPUs, given the proper conditions: Intel, AMD, ARM, etc.

 
Meltdown: Exploiting Out of Order Execution

Meltdown is similar to Spectre, in that they both leak information through the cache from instructions that were never “really” executed. However, with Meltdown there’s no branching involved. The attack relies on the way modern CPUs optimize performance by employing out of order instruction execution. Due to the availability of CPU execution units and dependent data, instructions are sometimes executed in a different order than they appear in a program, but they are retired (registers and state are updated) in order. The fact that they were executed out of order should be invisible to the program, but Meltdown shows that’s not always true. The details are described at cyber.wtf and in a separate academic paper.

When instructions are executed out of order, but aren’t yet retired, their results exist in a kind of limbo that’s very similar to speculatively executed code from branch prediction. However, it’s not clear whether this should properly be called speculative execution. The Meltdown paper refers to these as “transient instructions”, which seems like a good term.

Consider this code:

mov rax, [someKernelAddress]
and rax, 1
mov rbx,[rax+someUserModeAddress]

The first instruction should cause an exception if executed from a user mode process, because it’s an attempt to access kernel memory. However, at least on Intel architectures, it appears that the exception doesn’t occur until the instruction is actually retired. If this code is executing as part of a transient instruction sequence, no exception will yet occur, the privilege violation (bypassing kernel memory protection) will be ignored, and the transient read will succeed. The following transient instructions that use the read value will also succeed, and will affect the cache state in the same way as Spectre, creating a side-channel for leaking information about kernel memory.

Eventually, the first instruction will be retired and the exception will occur. The state changes related to the following instructions will be discarded, because those instructions were never really executed, but their cache side effects will remain. After an exception handler resolves the exception, further code can measure how long it takes to load from address someUserModeAddress vs someUserModeAddress+1, thereby inferring the LSB of someKernelAddress. Further iterations can read the other bits.

To ensure that the Meltdown code sequence executes as an out-of-order transient sequence, the cyber.wtf technique includes a long series of filler instructions ahead of it. These filler instructions all use a different execution unit than the units needed by the Meltdown sequence, and create a long chain of interdependent instructions. So the CPU must sequentially execute the filler instructions one at a time, but meanwhile it can also jump ahead and execute the Meltdown instructions out of order.

Note that “out of order” here refers to the Meltdown code being executed out of order relative to the filler instructions. The Meltdown instructions themselves are executed in order relative to each other, since each one is dependent on the previous one.

At this time, Meltdown appears to be limited to Intel CPUs only. It’s uncertain whether this is due to a fundamental difference in how Intel handles memory protection with respect to out-of-order execution, or is simply due to differences in the size of the reorder buffer between CPU vendors. In a short aside on Meltdown, the Spectre paper states “Meltdown exploits a privilege escalation vulnerability specific to Intel processors, due to which speculatively executed instructions can bypass memory protection.” However, the Meltdown paper reports that the authors were able to observe bypassing of memory protection during out of order execution on ARM and AMD processors too, but were unable to construct a working exploit.

 
Impact and Mitigation

Both of these vulnerabilities are very bad, enabling user mode code to read other protected memory. However, they’re both local exploits: the attacking code must be running on the machine being attacked, so an attacker must somehow get their code onto your machine first. For this reason, the greatest risk is probably to cloud computing environments, where processes from many different people are running on the same machine, supposedly isolated from one another thanks to memory protection. But there’s also a risk in any situation where one computer runs code received from another, even inside a VM or sandbox.

It’s not clear how Spectre can be fixed in software, because it relies on CPU features that are fundamental to all modern processors. In fact, I don’t think it can be fixed in software – at least not in any general way. One web site says, “as Spectre is not easy to fix, it will haunt us for a long time.” The only good news is that convincing the kernel to run an attacker’s special code with eBPF or other methods isn’t easy. And web browsers can be patched to protect against the sort of Javascript Spectre attacks described in the research paper. But other opportunities for Spectre attacks will remain. Future CPUs may contain hardware fixes for Spectre, but they’ll likely come with a complexity and performance penalty. Maybe cache lines will have to be treated as process-specific data instead of as shared resources.

Meltdown is the more serious of these two vulnerabilities, because it happens entirely in a single user space program and doesn’t require any special code in the victim process or kernel. In the real world, this makes it much easier for an attacker to exploit. Fortunately Meltdown can be fixed at the operating system level, but the fix carries a performance penalty that may be as high as 30%.

In the Meltdown sample code, readers may have wondered how an instruction like mov rax, [someKernelAddress] could possibly work in user mode code, even speculatively, since the kernel uses a different virtual to physical address mapping than the user mode process. It turns out that Linux (and I assume other operating systems too) maps the kernel’s memory into the address space of every user process, for performance reasons. By keeping the kernel permanently mapped, there’s no need to flush the TLB when switching between user and kernel space, and TLB entries for kernel space never need to be flushed. The processor’s MMU can normally be trusted to prevent user processes from accessing this kernel memory – except in the case we’ve just seen. The details are nicely explained in the description of a related technology named KAISER.

The “fix” is KPTI: kernel page table isolation. This ends the longstanding practice of mapping kernel memory into the address space of user processes. This ensures that mov rax, [someKernelAddress] won’t be able to reveal any information. However, it means that the TLB must be flushed every time there’s a switch between user and kernel code. If a user process makes frequent kernel calls, the constant TLB flushing will be expensive and have a significant negative performance impact.

 
The Future

Meltdown and Spectre are scary, not only because of what they can do themselves, but because they introduce a Pandora’s Box of new vulnerability types we’re sure to see more of in the future. We can no longer think about security analysis as something that gets applied to specific software programs or operating systems, examining the code and imagining it being executed instruction by instruction in an abstract environment. We must now consider the specific highly complex CPU (often not fully documented) that runs this software, and understand the many subtle ways in which the true hardware behavior differs from the instruction-by-instruction conceptual model.

The KAISER document mentions techniques like exploiting timing differences in fault handling, observing the behavior of prefetch instructions, and forcing faults using the Intel TSX (transactional memory) instructions. This will be a new frontier for most developers, forcing them to peel back a layer of abstraction when evaluating future computer security issues. The world just got a lot more complicated.

Read 4 comments and join the conversation 

Custom Mini Case for Macintosh LC, P475, Q605

Here’s a custom laser-cut case for the Macintosh LC family, Performa 400 series, and Quadra 605. By removing the internal floppy drive and fan, and replacing the internal SCSI drive with a SCSI2SD board, I was able to make a design that’s about half the size of a standard LC case. The pieces fit together loosely with tabs and slots, and then screws and nuts in T-slots provide extra support to make everything nice and solid. The finished case isn’t much bigger than my keyboard, which is neat. Here it is outfit with a Floppy Emu and an ADB-USB Wombat:

The logic board is screwed into the base piece, and the PSU is strapped down with zip ties. The case is a simple six-sided box, with the addition of a seventh interior piece that I call “the shelf.” This piece separates the rightmost interior region into lower and upper sections. The lower section houses any PDS plug-in card, and the SCSI2SD rests on top of the shelf in the upper section. Because of the way the SCSI cables are oriented, the SCSI2SD is mounted upside-down.

Here are the parts before assembly:

And the final product is shown at the top of this post. It’s an obnoxious shade of orange with a clear top. Who wants beige, anyway?

The speaker is taped to the inside of the case to prevent it from moving around, which is ugly. There’s not enough space for it to lie flat, so it’s propped up at a strange angle. I’ll hunt around for a smaller 16 ohm speaker to use in its place.

There’s a slot in the front where I’ve run a floppy ribbon cable, so I can hook up a Floppy Emu when needed.

The space above and below the shelf is very cramped, and I probably should have made it larger. Initially I couldn’t get my PDS ethernet card to fit, but after shaving a few millimeters by removing the metal shield from around the ethernet jack, it just barely squeezes in. The SCSI2SD was also a very tight fit. Using a SCSI cable with integrated strain relief, it wouldn’t fit, and I had to substitute a different SCSI cable without strain relief that’s a couple of mm thinner.

The case opening for the PDS card is fine, but without the metal shield, there’s a gap around the ethernet jack. Bonus ventilation! Here’s a photo of that, along with the right side vents:

With no fan I thought the computer might overheat, so I was prepared to take some temperature measurements. Stuffed with the guts of a Performa 475, I used an IR thermometer to take some readings at different spots inside the case, and I couldn’t find anything warmer than 112 F / 44 C. I was ready to mount a tiny fan inside, but that doesn’t look necessary now since the passive cooling is adequate.

If anybody wants to build their own, or use this as a starting point for further experiments, take these files. There are two files: one for the bottom and sides, and one for the top and shelf, so you can have the two different sheets made in different colors. Upload the files to ponoko.com, and have each one cut on a P2 sized 3mm thick acrylic of your color choice. You can also use 3mm MDF wood if you want a different look. Along with the case pieces, you’ll need 11 M3x10mm screws with matching nuts. #4-40 size screws probably fit too, but I haven’t tried it. You’ll also want some plastic zip ties to strap down the power supply.

A few things I’d do differently, if I were going to do this again:

  • Add about 3mm to the height under the shelf, to fit thicker PDS cards
  • Add about 3mm to the height above the shelf, to fit thicker SCSI cables
  • Integrate small feet into the side pieces, to elevate the case for better underside airflow
  • Have fewer vents around the PSU, and more vents around the CPU and PDS card
  • Cut the vent slots into the shape of an Apple logo
  • Reposition the floppy cable slot, so it’s better aligned with the logic board’s floppy connector
  • Add an opening for a power LED so I can tell when it’s on

Now back to playing with my little orange monster…

Read 5 comments and join the conversation 

Future Hardware with Animal Names

Yesterday’s post mentioned some hypothetical marsupial-themed hardware: WiFi Wallaby, Video Platypus, and others. While these were meant as a joke, they got me thinking about what exactly a “Video Platypus” and friends might do, and I’m outlining some possibilities below. These are all tied loosely into vintage Macintosh hardware, although other ideas of interest to the general Arduino/RPi audience would be nice too.

 
Video Platypus

This might be a way of providing video out for compact Macs like the Plus and SE. I’ve discussed a few potential methods for doing this before. One approach is to directly tap the CRT video and synchronization signals and resample/convert them to a standard format. Another possibility is sniffing the address and data bus to watch for CPU writes to the framebuffer region of main memory, then use that to construct a new video signal.

Video Platypus could also be a converter or upscaler for the Mac II series and later machines. VGA adapters for these machines are inexpensive and easy to find, but VGA itself is a slowly dying standard. It would be nice if I could get a direct HDMI or DVI-D output from my 680X0 or PowerMac. Probably this wouldn’t need to be Mac-specific – it would just be a VGA to HDMI converter with a different physical connector to support the Mac. Something like this must surely exist already?

 
Disk Kangaroo

An external fileserver would be nice for old Macintosh computers: a device you plug into the computer and that appears as a large local or remote disk. Floppy Emu already serves this purpose when it’s configured in HD20 hard disk mode, but only a small number of Macintosh models support HD20 and have the necessary external floppy connector.

Disk Kangaroo could be something like a Floppy Emu for LocalTalk. Just plug it into the Mac’s LocalTalk port (the printer port), and it would appear as a fileserver. You wouldn’t be able to boot from it the way you can from Floppy Emu, but it would work on virtually every Mac model and system software version. The I/O speed would be about the same as Floppy Emu, I think.

The same idea could be applied to a SCSI disk instead, so the device would appear as a local disk and the computer could boot from it. This would be similar to SCSI2SD, except instead of formatting the whole SD card as a Macintosh disk, the SD card would contain a library of disk images to choose from, just like Floppy Emu. This would make it easier to set up and use for file transfers to and from an internet-connected PC.

Both the SCSI and LocalTalk disks could also use remote storage instead of an SD card. The files could be served from a PC on the same LAN, which would might require some special software on the PC, or the device could potentially do Appletalk-to-Samba translation. Or files could be served directly from a cloud storage account like DropBox.

 
WiFi Wallaby

Everybody loves the ESP8266 for connecting oddball things to WiFi. What might this do for a vintage computer? Most old Macs are capable of Ethernet networking, although many require an add-in networking card that’s now rare. I’m not sure if it’s easy or even possible to go from that to a wireless network connection.

What might you use this wireless connection for – general web surfing, email, and FTP? Or for connecting to other vintage Apple computers and printers wirelessly with Appletalk?

Maybe this could be like a WiFi version of Farallon PhoneNet. Connect a WiFi Wallaby to each of your computers and printers and they’ll auto-connect and form an Appletalk network. Same idea as the phone cables in PhoneNet, but wireless.

 
Printer Koala

A clever microcontroller board with the necessary physical connector could emulate an Imagewriter II or other 80’s – 90’s Apple printer. What would be the point of that? Maybe it could act as a print server or translator, enabling the old Macs to use modern printers. The need for printer drivers could make that difficult, though. Or maybe “printing” could perform another function like converting the document to PDF and storing it on an SD card or on a cloud-based server. Or it might implement a print-to-Facebook or print-to-Twitter feature.

Working in the opposite direction could be interesting too: a device that connects to an Imagewriter II or Stylewriter or LaserWriter. The device could put these classic printers on a network so that modern computers could print to them from Windows, OSX, or Linux. There would be a question of printer drivers again, but for relatively simple printers like the Imagewriter that might be doable.

Read 12 comments and join the conversation 

Older Posts »