BMOW title
Floppy Emu banner

128 Is Not Enough

With more work, I was able to further squeeze the design down to 77 macrocells! You can grab the latest version of the Verilog files here.Unfortunately, when I then started to add the X register and its related functionality, things turned ugly very quickly.  Even just adding an X register that could be loaded and stored, incremented and decremented, ballooned the macrocell count to 111. I went back and forth over it several times, trying various optimizations. I even tried doing a major reorganization of the whole design, moving all the math ops to the ALU, and introducing explicit datapaths instead of relying on Verilog to infer them, but nothing helped too much. And I can’t really go all the way to 128 macrocells anyway, due to routing congestion and other constraints. Even the 111 macrocell version does not actually fit the Max 7128.I’m still going to try a few other changes to see if I can improve the resource usage enough, but I know that adding indexed addressing and a stack is going to require lots more resources, so I’d likely just be postponing the inevitable. I think I will end up with three choices:

  1. Be satisfied with what I have now, add a few more little things, and call it done. That would mean giving up indexed addressing and a stack, making the CPU far less capable.
  2. Start over from scratch with a new, detailed design that accounts for exactly how it will be synthesized. Essentially, compute by hand the logic equations that should be implemented in each macrocell, so I can know exactly what’s happening and where the resources are going, and hope to use them more efficiently.
  3. Drop the idea of a 6502-lite CPU, and implement some entirely different design with a simpler architecture, like a stack machine.

At the moment, none of those options really appeal to me. We’ll see…

Read 3 comments and join the conversation 

3 Comments so far

  1. Steve - March 31st, 2010 6:09 am

    I thought of a 4th route, which is sort of like the first, but feels less like failure: eliminate all the CPU instructions and functions that aren’t very valuable, relative to the amount of resources they require and how frequently they’re used in typical programs. For example, the V (overflow) flag is rarely used, but is complicated to compute, and it can be computed by a software subroutine in cases where it’s really needed. Subtraction is just complement and addition. Etc.

  2. Tom - April 1st, 2010 7:20 am

    Have you looked at other CPLD architectures? Altera’s MAX II and Lattice’s MACHXO are CPLDs based on LUTs instead of macrocells – which should allow much easier routing.

  3. Steve - April 1st, 2010 10:40 pm

    I’m intentionally restricting myself to older device families that come in a PLCC package, so I can easily hand-solder a working computer when I’m done with the design. Also, being starved for resources makes it an interesting challenge!

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.