BMOW title
Floppy Emu banner

Tiny CPU in a CPLD

Tiny CPU is a custom “small CPU” design intended for implementation in a CPLD. Such soft CPU cores typically target an FPGA or large CPLD, but the target device for Tiny CPU is a small Altera CPLD with limited logic resources. This constrains the CPU to a minimal set of features in order to fit. It is an 8-bit CPU with only two registers, and a 10-bit address space. The instruction set is a subset of the 6502 instruction set, with modifications to reflect the smaller address space and number of registers.

Download the Tiny CPU file archive, including the assembler and Verilog source files.

Design

The project is split into two halves, originally imagined as separate chips, but now combined into one. The core CPU module is called Tiny CPU, while a companion module called Tiny Device implements address decoding, bank switching, and peripheral I/O. As a pair, Tiny CPU and Tiny Device are intended to be combined to make a working single-board computer, using only a CPLD and an external SRAM and ROM.

The original target device was Altera’s EPM7128, a 128 macrocell CPLD based on Altera’s older 5V technology. A single macrocell consists of one flip-flop plus some combinatorial logic, and can compute a one bit result from 1-10 inputs, where the result is expressed as a some-of-products of the inputs. An 8-bit register requires at least 8 macrocells, and structures like counters, adders, and muxes consume many more, so 128 macrocells for a full-fledged CPU is a challenge. Tiny CPU was planned to occupy one EPM7128, with Tiny Device in a second identical CPLD. Verilog source for both designs was written and simulated, and both successfully fit into the target device, but no hardware was ever built using this design. See the link below to download the source.

After a long break, development resumed with a new plan, this time using a single Altera Max II EPM570 CPLD instead of the two EPM7128s. The Max II is a more modern device using a different internal technology, and Altera states its logic capacity is equivalent to roughly 440 macrocells. It’s also a 3.3V device, so the SRAM, ROM, and other components from the original design were all migrated to 3.3V as well. Construction of a Tiny CPU demonstration computer using a custom PCB and this hardware is currently in progress.

opcode x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
0x SUB abs SUB imm SUB abs,X   ADD abs ADD imm ADD abs,X   CMP abs CMP imm CMP abs,X   NOR abs NOR imm NOR abs,X  
1x LDA abs LDA imm LDA abs,X   STA abs STA imm STA abs,X   LDX abs LDX imm CPX abs CPX imm STX abs      
2x     BNE
abs
      BEQ abs       BCC abs       BCS abs  
3x PLA PLX RETURN   PHA PHX JMP abs CALL abs INX DEX            

Addressing Modes

imm immediate LDA #$1F operand is literal byte $1F
abs absolute LDA $1FF operand is contents of address $1FF
abs,X absolute, X-indexed LDA $1FF,X operand is contents of address formed by adding $1FF to the value in the X register
impl implied INX operand is implied by the instruction

Encoding

The instruction’s opcode is packed into the most significant six bits of a program byte. Instructions with no operands (implied addressing) require only a single program byte. Address operands are 10 bits, formed from the least significant two bits of the first program byte, and all eight bits of the second program byte. Immediate operands are 8 bits, taken from the second program byte.

Programmer-Visible Registers

PC program counter (10 bit)
SP stack pointer (6 bit)
A accumulator (8 bit)
X index register (8 bit)
SR status register [carry, zero] (2 bit)

Processor Stack

LIFO, top down, 64 entry, $3C0 – $3FF

 

Tiny Device

Tiny Device implements bank switching, address decoding, a PS/2 keyboard interface, serial input and output, a parallel LCD driver, tick counter, clock division, a general-purpose parallel port, and an I/O status register. More details about Tiny Device’s functions can be found in the Tiny Device introductory post. The bank switching mechanism is described here.

With its 10-bit address bus, the CPU sees 1K of memory. This is divided into two 512-byte blocks. Block 1 contains the stack, I/O ports, and a scratch RAM area. It is the “common” block, and is always present in the CPU’s address space no matter what is happening with bank switching. In contrast, block 0 is a swappable memory area, and can be mapped to any bank in physical memory.

Physical memory is 64K, and is divided equally between ROM and RAM. The 64K physical memory space is partitioned into 128 banks of 512 bytes each.  Any bank can be mapped into block 0. Bank 127 is always mapped into block 1, the common block.

The bank select register is part of the memory-mapped I/O ports in common memory. To swap a bank, the CPU only needs to write the new bank number to the appropriate address.

A few benefits of this bank switching design are:

  • Upon reset, bank 0 is mapped to block 0. That puts 512 bytes of ROM, 440 bytes of RAM, the I/O ports, and the stack all in the CPU’s address space. That’s plenty for many small programs, and means they won’t have to bother about bank switching at all.
  • Larger programs (lots of program code) can be accommodated by bank switching code segments in/out of block 0, all operating on common data in block 1.
  • Programs operating on large data structures can copy some bank-switching helper code to block 1, then swap additional RAM banks in/out of block 0.
  • Arguments can be passed on the stack to ROM helper routines in other banks, because the stack is in common memory.
  • All of ROM is addressable, with no holes. This makes storing images, audio samples, and other data in ROM much easier.
  • There is no difference in handling between ROM and RAM banks. A program running entirely from RAM works just like one whose code is in ROM.

 

Hardware

A custom-designed circuit board holds the Altera Max II EPM570T, 512KB Flash ROM, and 32KB SRAM that form the heart of the computer. A 1.8-inch color TFT on a breakout board serves as the display. A piezo speaker and two LEDs provide opportunities for simple I/O. Headers for JTAG, serial, and a PS/2 keyboard enable connections to other devices or a PC. Because the serial interface and keyboard operate at 5V, a 74LVC08 is used to level shift to 3.3V for communication with the other components. A 20-pin expansion header exposes unused Max II pins to provide additional I/O opportunities.

18 comments 

18 Comments so far

  1. Firsties! - September 15th, 2010 3:31 am

    I can’t believe nobody has posted yet!

    You’ve got an interesting project here, and it’s well-documented to boot! Keep it up.

  2. zerx - November 2nd, 2010 6:45 pm

    Hi,I’m insterest in the Tiny CPU,and want to learn the cpu architecture from the Tiny CPU.
    I have already readed the source code several time.but the detail of the instruction set of Tiny CPU is hard to understand by the code.
    Would you please give me some documentations about the instructions set and the design of it.

    Thanks a lot~~~^^!

  3. Steve - November 2nd, 2010 7:02 pm

    If you’re interested in the microarchitecture, see this post: http://www.bigmessowires.com/2010/04/18/tiny-cpu-architecture/

    If you’re interested in the instruction set from a programmer’s perspective, then you’re right, it’s not really documented. It’s a lot like the BMOW instruction set (which is a lot like 6502), but with many instructions and addressing modes omitted.

    Once I finish my current project at work, I hope to have enough time to return to Tiny CPU and finish it.

  4. Gary - May 6th, 2011 8:14 pm

    Great projects Steve,
    Loved the BMOW1 and look forward to TinyCPU or anything you come up with.

    For myself been looking at Bitcoin generating architecture models. There must be a more efficient way than running multiple power hungry GPU’s? Actually there is, FPGA’s and you may be the man to get your head around programming one to act as a Bitcoin hasging core.

  5. Zach - May 11th, 2011 9:31 am

    BMOW 1 was quite cool. I was inspired to get my CPLD-based machine on the web, too. I’m in the process of documenting it and moving all the web stuff over to a real hosting provider so I can get it into the Homebuilt CPU ring.

    You can find it at:
    http://sites.google.com/site/zmetzing/home/toro-clock-project

  6. Joe - October 20th, 2011 8:31 pm

    http://www.praxibetel.org/toro/toro.html
    The Toro Clock Project site moved. In case the old redirect dies, here’s a new link.

  7. Zach - December 4th, 2011 5:34 pm

    Yup, I forgot to update that link! Thanks, Joe.

    I hope to see many others doing their own designs, discrete logic or programmable, in the future. There’s nothing quite like the thrill of watching your CPU execute opcodes that wouldn’t run on any other existing machine.

  8. MacDoogie - January 10th, 2012 11:05 am

    Hey, I remember doing a “tiny” RISC CPU in an Altera device for my senior design project back in ’98! It had 16 instructions, eight in the ALU, and 8 non-ALU instructions. I targeted an EPM9320 that Altera provided to me as a sample. It was supposed to be an 8-bit CPU, but it wouldn’t fit in the 9320, so instead of paring down the instruction set, I “scaled” it down to a 4-bit proof of concept. Actually, my concept was to prove that it could be scaled upward, but ironically I ended up scaling it down. I actually utilized 98% (Yes, ninety-eight percent!) of the CPLD. Typical utilization was around 80% back in the day, but the 9000 series CPLDs had a cool feature where you could split the register bits from the and/or tree. This worked out well for a CPU design as a CPU design has a large need for both register banks and separate muxing logic for the internal busses. I still have the proto board I made using a toner transfer system. I drew the board pure art style (No auto routing tools) in Adobe Illustrator 7 on a Power Macintosh 7200. I had a ribbon cable running to a breadboard with RAM and ROM to run a simple program that exercised all of the instructions. A visible only under microscope PCB trace bridge between two “Reserved” pins kept the thing from working properly and it took me two days and the stereo scope at my intern work lab to figure out that problem. Then when I was demoing it to the professor on the day Grades were due(one hour before the deadline), the LabMate I was using to generate one-shot clocks from a spring loaded toggle switch kept double-clocking and skipping instructions, which made the prof question my design. After trying to debounce with capacitors didn’t work, we finally ran a 1Hz oscillator into the clock to prove the issue was not in my design. At the end of it all, my professor was like “That’s all you can fit in an FPGA? I’m not impressed!” Still, I got an A+ after all the hassle and got to graduate 🙂 Sorry to regale you all with my tale, but the tiny CPU project fired a neuron of projects long past 😉

  9. freddy ferrer - January 11th, 2012 6:37 pm

    buenas amigo muy interesante tu proyecto,me gustaria saber si algunas ves has desarrollado un motor grafico propio 😀 grasias de antemano

  10. freddy ferrer - January 14th, 2012 2:58 pm

    hola me gustaria saber si puedo usar los componentes elctronicos de un playstation 1, en la contruccion de una pc homebrew,tales como el procesador y cosas haci 😀

  11. janrinze - September 28th, 2013 3:20 am

    Nice project!
    I wonder how much logic is avaiable in a CPLD, could this design be extended to 32 bit cpu’s?
    Did you need to trim down because of logic element constraints?

  12. Steve Chamberlin - September 28th, 2013 6:34 am

    Exactly. A CPLD typically only has a few hundred macrocells (logic elements), so the challenge was to simplify the CPU to fit that constraint. With a large CPLD or an FPGA, you could make a more full-featured CPU with a larger address and data size. The OpenRISC softcore CPU is one example: http://opencores.org/or1k/OR1200_OpenRISC_Processor

  13. barrym95838 - June 25th, 2014 11:09 pm

    Pretty cool!

    How does STA imm (opcode $15) work? Does it cause some kind of twisted self-modification, like on the 6800?

    I would try to figure it out myself, but I can’t read Verilog yet.

    Thanks,

    Mike

  14. Steve Chamberlin - June 26th, 2014 7:28 am

    Hey, that is an excellent question! How the heck did STA imm get in there? It makes no sense – I’m not sure what I was thinking. Looking briefly at the Verilog, I think it might treat the last 10 bits of the instruction as both an immediate value and an address in which to store it.

  15. Charles - December 15th, 2014 11:58 am

    this is pretty cool. i can’t belief that I/C has all this in it!!!!!!!!!!!!!!

    Tiny Device implements bank switching,
    address decoding,
    a PS/2 keyboard interface,
    serial input and output,
    a parallel LCD driver,
    tick counter,
    clock division,
    a general-purpose parallel port,
    and an I/O status register.

    sick… i’m thinking i want to go this direction… might give the mac hobby a break. there is just so much for me to learn here.

  16. Charles - January 19th, 2015 4:02 am

    is there a youtube video of this going? this thing is cool as hell…
    ITs got all the goodies.

    Did you ever get more then stripes on the screen?

  17. Arthur - May 11th, 2015 3:57 am

    hi do you have the HW’s sch & brd files and share with us?

  18. Steve Chamberlin - May 11th, 2015 6:53 am

    They’re not in the file archive (see link above), since the focus is really on the CPU and not the experimental board that uses it. If you really want the board files though, I can dig them out and send them to you.

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.