BMOW title
Floppy Emu banner

Archive for April, 2010

Experimental Hardware

With the design of the Tiny CPU core more or less finished, I’ve started thinking about how to build a small computer around it. My goal is to create a simple machine with a keyboard input, a 4-line LCD output, and a few buttons and LEDs for debugging. Everything should be mounted on a custom PCB that I’ll design as well.

I’ve purchased an Altera USB-Blaster, and a CPLD prototyping board containing the same CPLD model that I plan to use. This will let me see exactly how someone else built a working system around this device, and give me something to compare to when my own machine inevitably fails to even turn on after it’s built. I can also add a few components to the prototyping board, to try out a scaled-back version of the computer design before I commit to manufacturing my custom PCB.

The documentation with this board was pretty sparse, and the USB-Blaster clone had none at all, but after a little work I managed to figure it out. I’ve been able to reprogram the CPLD on the board, and do a few basic LED blinking types of tests. If I get motivated, I may try to fit a RAM, ROM, and a few other parts in that empty area on the right, and see what I can do.

For the ultimate Tiny CPU PCB, even for a “simple” system, there are going to be quite a lot of components. Assuming I use the free version of Eagle for the PCB layout, with its 10cm x 8cm area limit, I may need to stack two or even three boards to get everything in. The semi-final parts list is:

  • CPLD #1 – for the Tiny CPU
  • CPLD #2 – for address decoding, LCD interface, keyboard interface, etc
  • ROM (in a ZIF socket, or maybe a JTAG-programmable ROM)
  • RAM
  • clock oscillator
  • DC power jack
  • voltage regulator
  • reverse voltage protection diode
  • capacitors for voltage regulator
  • PS2 keyboard jack
  • pull-up resistor for keyboard clock
  • Shottky inverter for keyboard clock, to address very slow slew rate (based on BMOW experience)
  • LCD connector header
  • resistor for LCD backlight
  • variable resistor for LCD contrast
  • piezo beeper
  • variable resistor for volume
  • transistor for piezo power
  • 7-segment LED
  • current-limiting resistors for 7-segment LED
  • reset button
  • pull-up resistor for reset button
  • power LED
  • current-limiting resistor for power LED
  • on/off switch
  • rotary encoder
  • push button
  • ISP/JTAG header (connnect both CPLDs into a JTAG chain)
  • RC reset circuit
  • debug headers

That’s a lot of stuff to fit into 80 cm^2. For comparison, the board in the photo above is about 126 cm^2, but contains less hardware than what I think I’ll need.

Read 6 comments and join the conversation 

Variable Size Instructions

My analysis of the advantages of fixed-size instructions proved to be badly flawed. The improvements I saw when switching to a 16-bit fixed instruction size were not what I originally thought: the size and speed gains came from the reduction in address size, which reduced the size of many instructions, and sped up their execution. The gains had nothing to do with the fact that all instructions were now a fixed size. In fact, going to a fixed size made matters worse for instructions like push and increment, which were now larger and slower.

Fortunately, this was almost trivially easy to fix. With just a few lines changed in the assembler and Verilog source, I was able to restore all the implicit instructions to a single byte, while keeping address-oriented instructions at two bytes (with an embedded 10 bit address). That provides the best of both worlds:

Variable Size, 16-bit addr Fixed Size, 10-bit addr Variable Size, 10-bit addr
macrocells 119 112 116
verification program size (bytes) 2055 1890 1629
verification program execution time (clocks) 835 574 552

The gains aren’t amazing, but every little bit helps. The space savings are especially nice, since with the 10-bit address space, I’ll need to make the most of every byte.

Read 1 comment and join the conversation 

Tiny CPU Architecture

As promised, here’s the Tiny CPU architecture diagram. SP is the stack pointer, and is 6 bits, providing a 64-entry stack. EA is the effective address, used for data load/store from absolute or computed addresses. PC is the program counter. The accumulator A and index register X are the only data registers. The datapath is controlled by a state machine and combinatorial logic, using the current opcode, state, and arithmetic/logic flags as input.

The diagram glosses over a few details, such has how the 8-bit data bus is connected to 10-bit address registers. Where busses and registers of differing sizes are connected, additional logic selects the low or high byte as needed.

Be the first to comment! 

Tiny Asm

I’ve finished writing the Tiny CPU assembler, and it works. It took about four hours across two nights to get something with basic functionality. The curious can take a look at the assembler source code for details.

I don’t have much experience with writing these kinds of tools, so my parser is a little ugly. It goes line by line, ignoring whitespace and comments, until it finds a line beginning with a token. This token must either by an instruction mnemonic, or a label. If it’s a mnemonic, a few additional checks determine the operand and address mode, and then a table lookup determines the opcode value for that instruction and address mode combination. If it’s a label, its address is stored, and all previously-pending references to that label are resolved. Anonymous forward and backward labels are also supported.

It would be nice to add features like named constants, conditional compilation, and macros. The assembler also lacks directives for setting the assembly address, or embedded constant data like tables and strings. I’ll add some of those features later, as the need arises.

Read 5 comments and join the conversation 

Fixed Size Instructions

I’ve finished my experiment with fixed-size instructions for Tiny CPU, and the results are encouraging. I did a straightforward conversion to a 16-bit instruction size, with the opcode in the upper 6 bits and the address (if any) in the lower 10. Here’s how it compares to the original, variable-size instruction version:

Variable Size Fixed Size Percent Reduction
macrocells 119 112 6
verification program size (bytes) 2055 1890 8
verification program execution time (clocks) 835 574 31

So it’s an improvement across the board. The only drawback is that increasing the address size to something larger than 10 would be fairly difficult. It’s technically possible to fit all the opcodes into 5 bits (there are 31 unique opcodes), allowing for 11 bits of address. However, it would be a poor encoding that would probably require the decoding logic to be substantially more complex, increasing the macrocell count.

I wrote a tool to convert variable-sized program binaries into fixed-size, but it’s ugly and brittle. My next step, therefore, will be to write a custom Tiny Assembler for my Tiny CPU.

Read 4 comments and join the conversation 

Tiny CPU

I’ve got a working CPU! You can grab the Verilog source and a testbench here. The instruction set and addressing modes are as I described them in my previous post, except that I shrank the stack pointer to 6 bits (64 byte stack), and was able to add the missing branch if carry/zero not set instructions. The CPU has a 10-bit (1K) address space, and fits in 119 macrocells of an Altera EPM7128S, when set to optimize for area and with Parallel Expander Chain Length set to 0. Sometime soon, I’ll make some nice datapath diagrams and post them.In addition to the small address space and limited instruction set, there are a few ugly elements of the design that were necessary to make it fit the device. The absence of a Compare X instruction is glaring, but is impossible to include with significant changes. There’s also a wasted state after many of the math/logic ops, in which the Zero flag is redundantly set. This makes those instructions take one clock cycle longer than actually necessary, but was necessary to avoid more complicated state transition logic. The Zero flag handling in general is definitely awkward.So what’s next? I hope to shrink the design slightly further, by simplifying logic, using more Altera primitives, or by using a smarter instruction set encoding that uses instruction bits directly as control signals. If I can save a few more macrocells, I hope to increase the address space to 11 or 12 bits (2K or 4K), because 1K feels very limited.Beyond that, I’m considering a few larger changes:

Fixed Instruction Size

The current design has instructions that are one, two, or three bytes in size. I’m considering moving to a fixed instruction size of 2 bytes: 6 bits for the opcode, and 10 bits for an address or constant value. This would simplify the state machine logic, eliminating extra states needed to perform operand fetches, and reducing the logic resources needed to implement the state machine. It would probably also result in slightly more compact code, making more efficient use of the limited address space.The downside of a fixed instruction size is that it would also fix the address size at 10 bits (or maybe 11 if I’m really clever), with no hope of increasing it. It would also require an opcode register, to hold the first 8 bits of the instruction while the second 8 bits are fetched. And it would force me to throw out the bastardized 6502 assembler I’ve been using, and create some new software tools.

Larger Bus Size

If I switch to a fixed 16-bit instruction size, it may also be worthwhile to switch to a 16-bit data bus. This would permit loading an entire instruction in one clock cycle, eliminating the need for the opcode register, and further simplifying the state machine. The downside is that I’d then need extra logic to make the memory byte-addressable for load/store of data, or else increase the data word size to 16 bits and forget about byte addressing entirely. A larger data bus output mux would also be needed. And of course, two parallel 8-bit RAMs would be needed on the CPU board.

Harvard Architecture

Not a Colonial Period building at Harvard University, but a computer with separate address spaces for programs and data. This would permit a 16-bit interface to program memory, and 8-bit interface to data memory, which is seemingly the best of both worlds. The program memory address bus wouldn’t need a mux, because it would always be driven by the program counter. Separate program and data memories would also allow for faster CPU operation, by enabling instruction fetches and data access to happen in parallel. The total amount of addressable memory would also increase, because the program and data memories could each be 1K in size, for 2K total.Separate program and data memories mean the CPU board would need two 8-bit ROMs as well as an 8-bit RAM, further increasing the component count.The major drawback of the Harvard Architecture is that working with large data constants like strings and tables is cumbersome, because they must be loaded or copied a byte at a time using Load Immediate instructions.  The indexed address instructions typically used to access such structures operate on the data memory. The standard solution to this problem is to use a Modified Harvard Architecture, adding new instructions like Load Constant Indexed to fetch values from program memory. Unfortunately that negates some of the original advantages, requiring an additional address register for program memory, an address bus mux, and additional complexity in the state machine.

Read 4 comments and join the conversation 

Older Posts »