BMOW title
Floppy Emu banner

Archive for August, 2013

Nibbler Software Tools

During development of the Nibbler CPU, I’ve created several software tools to help make my life easier. Some of these have been nearly as interesting as the CPU itself! While professionally I work with C++ or Java, for my personal projects I find the .NET languages like C# to be most convenient. They include a robust set of libraries for common data structures and I/O, and also make it a snap to build a GUI. To create my tools I use the free Visual Studio Express IDE, running on Windows. There are plenty of other development options like Eclipse and the GNU tools, but I’m already familiar with the Visual Studio IDE, and the $0.00 price is hard to beat!
Microcode Assembler

The simplest tool is the Microcode Assembler, which doesn’t actually assemble anything, but instead generates two binary files for the contents of the two microcode ROMs. The program is written in the version of C++ that uses managed code and the .NET foundation. I’ve never been entirely clear what the proper name for it is: Managed C++, or Visual C++, or C++/CLI?

In the beginning I intended to describe the microcode operations in a text file, using register transfer language, and then the Microcode Assembler would assemble it into binary. But when it became clear the microcode would be fairly simple, I dropped the idea of using RTL, and just wrote some C++ code to directly generate the required binary data. Using the microcode table from my previous post, this was a quick task. The program contains lots of bit-shifting fun like:

    // 9 JNE
    for (int c = 0; c < 2; c++)
        microcode[(9 << 3) | (c << 2) | (0 << 1) | 1] = 0x383F; // NE
        microcode[(9 << 3) | (c << 2) | (1 << 1) | 1] = 0xF83F; // E

Whee! Those hexadecimal numbers are just rows from the microcode table, with the 16 control signals expressed as a 4-nibble hex value. Nothing very exciting here, but it gets the job done.

I needed some way to write programs for the CPU, using symbolic instruction names and branch labels, instead of stringing together opcode values by hand. Woz supposedly hand-assembled most of the Apple II ROM routines, which is damn impressive, but not really something I wanted to duplicate.

I considered taking an existing open source assembler and modifying it for my needs, but as it turned out, I already had a suitable assembler that I’d previously written for my Tiny CPU project. With a few hours of work, I was able to adapt that assembler for Nibbler’s purposes.

Unlike the other tools, the assembler is written in vanilla C++, and runs as a command line program. It takes a single .asm file as input, and assembles the code into a .bin binary output file. It also generates a .sym symbol file, containing the values of all the labels and constants in the code, as well as the assembled address of each line of code. The symbol file is used later by the simulator, in order to perform source-level debugging.

The assembler doesn’t have any fancy features like macros or conditional compilation, but it does support:

  • Decimal constants
  • Hex constants, preceded by $
  • Character constants, contained in ‘ ‘ single quotes
  • < and > operators to extract the high or low nibble of a byte constant
  • Named constants using #define (both data and address constants)
  • Named labels, and jumping to a named label
  • Unnamed + and – labels, for jumping forward and backward
  • Comments, starting with ;

Here’s a snippet from an example .asm file, showing the kind of code the assembler can handle. This snippet watches the input buttons, and writes a character to the LCD whenever a button transition occurs.

    ; example.asm
    ; memory locations
    #define PREV_BUTTON_STATE $00D
    #define NEW_BUTTON_STATE $00C
    #define RETURN $00F
    #define LCD_OUT_H $000
    #define LCD_OUT_L $001
    ; buttons
    #define BUTTON_LEFT $1
    #define BUTTON_NOT_LEFT $E
        ; reset the button state
        lit #$F
        ; wait for a button state change
        -   in #0
        cmpm PREV_BUTTON_STATE
        je -
        ; save the new state, and determine what changed
        ; is left button currently pressed?
        lit #BUTTON_NOT_LEFT
        norm NEW_BUTTON_STATE
        cmpi #BUTTON_LEFT
        jne check_left_released
        ; was it previously unpressed?
        lit #BUTTON_NOT_LEFT
        norm PREV_BUTTON_STATE
        cmpi #BUTTON_LEFT
        je check_right_pressed
        ; left button changed from unpressed to pressed: print an upper-case letter L
        lit #<'L'
        st LCD_OUT_H
        lit #>'L'
        st LCD_OUT_L
        jmp print_button_change
        ; was left button previously pressed?
        lit #BUTTON_NOT_LEFT
        norm PREV_BUTTON_STATE
        cmpi #BUTTON_LEFT
        jne check_right_pressed
        ; left button changed from pressed to unpressed: print a lower-case letter L
        lit #<'l'
        st LCD_OUT_H
        lit #>'l'
        st LCD_OUT_L
        jmp print_button_change
        ; ...handle the other buttons
        lit #1
        st RETURN ; store 1 at RETURN, lcd_write uses this to know it should jump to next1 when it's finished
        jmp lcd_write ; writes the character at LCD_OUT_H, LCD_OUT_L to the LCD
        ; update the previous button state to the new state
        jmp print_button_changes


The most complex tool by far is the machine simulator. Originally developed for BMOW, this GUI-based tool is written in Managed C++, and simulates the data and control paths of the CPU. It supports source level symbolic debugging, disassembly, microcode debugging, code breakpoints, data breakpoints, memory inspection, and I/O simulation of the LCD and input buttons. It’s not perfect, but it’s a sweet tool that makes diagnosing problems dramatically easier than it would be otherwise. And who doesn’t love simulation?

The simulator uses the assembled program binary, the program source and symbol files (if available), and the microcode ROM binaries. Because it uses the microcode to control the data paths, it needs no special knowledge of Nibbler’s instruction set in order to do the simulation: it’s just a dumb series of paths governed by control signals, same as the real hardware. But for the sake of code disassembly and debugging, there is some knowledge of the Nibbler instruction set built in.

In the image above, the simulator is doing source level debugging of a program similar to the previous button watcher example. Execution is currently stopped at the CMPM PREV_BUTTON_STATE instruction, as shown by the yellow arrow. There’s a breakpoint two instructions further down, at the line with the red circle. If the simulator is started and a sim-button is pressed, the program will break out of the loop and stop at the breakpoint. Yahoo, interactive debugging!

From right to left, the simulator control buttons are:

  • Run – Keep simulating until explicitly paused, or a breakpoint is hit
  • Pause – Momentarily stop the simulation
  • Microstep – Simulate one clock cycle, then pause
  • Step – Simulate to the start of the next instruction, then pause
  • Reset – Set the program counter to 0

The upper-left shows the current CPU state, and the contents of RAM. Below those is a 16×2 character LCD display, using a partial simulation of the LCD’s HD44780 controller module driven by the OUT0 and OUT1 ports.

At the bottom left is a microcode “disassembly” of the current instruction. In this case for the instruction CMPM, phase 0 uses the PC as an address into ROM, retrieving a byte which is stored in the Fetch register. Phase 0 also increments the PC. Phase 1 uses the ALU to calculate A minus RAM(addr), where the RAM address is formed from the immediate opcode value and the current program ROM byte, as described in previous posts. It also loads the C and E flag registers with the result of the ALU calculation, and increments the PC.

For all its capabilities, the simulator won’t catch every problem that might occur on the real hardware. It’s purely a dataflow simulator, and has no concept of timing at a finer resolution than one clock cycle. If there’s a race condition somewhere, or a timing requirement that isn’t met, the simulator won’t detect the problem. It’s also too forgiving of microcode errors that would break the real hardware, like enabling multiple sources to drive the data bus as the same time. But despite these shortcomings, I’ve found the simulator to be an incredibly helpful tool. And it’s a huge confidence boost to see everything running in the simulator, before building the actual hardware. 🙂


Read 3 comments and join the conversation 

Programming the 4-Bit CPU

Instructions for the Nibbler 4-bit CPU come in two types: immediate and addressed. Both types begin with 4 bits of instruction opcode, to identify the specific instruction. Immediate instructions like ADDI include a 4-bit operand value embedded in the instruction, and so are 8 bits in total size. Addressed instructions like JMP include a 12-bit address in data space (RAM) or program space (ROM), and are 16 bits in total size. The instruction encoding is very simple:

The four instruction opcode bits i[0..3] are combined with the ALU carry flag C and equal flag E, as well as the Phase bit, in order to form a 7-bit address for the two microcode ROMs. The ROMs’ outputs form the 16 control signals needed to orchestrate the behavior of all the other chips. The contents of the microcode ROMs are shown in the table below.


The first line of the table shows that phase 0 is the same for all instructions. That’s good, because at this point the instruction hasn’t been fetched from memory yet, so the CPU doesn’t even know what instruction it’s executing! Nothing interesting happens as far as the microcode is concerned, while the Fetch register is loaded with the program ROM byte, and the program counter is advanced to the next byte.

Phase 1 is where all the interesting work happens. ADDI is a good example of a typical instruction. Its instruction opcode is 1010 binary, or $A hex. For this instruction, the same control signals will be asserted regardless of the C or E flags. /oeOprnd will be 0, enabling the bus driver to drive the immediate operand value onto the data bus, where it connects to one of the ALU inputs. The other ALU input is always connected to the accumulator register. The /carryIn, M, and S[0..3] control signals will be set to put the ALU into addition mode, with no carry-in. /loadA will be 0, so the ALU result will be stored to the accumulator. /loadFlags will also be 0, so the carry and equal flags will be updated with the results from the addition.

JE (jump if equal) is another good example. Its instruction opcode is 1000 binary, or $8 hex. If the E flag is 1, then incPC will be 0 (PC will not be incremented) and /loadPC will be 0 (PC will be loaded with the new address). So if E is 1, the jump will be taken. If the E flag is 0, then incPC will be 1 (PC will be incremented) and /loadPC will be 1 (PC will not be loaded). So if the E flag is 1, the CPU just skips over the destination address byte and advances to the next instruction, without taking the jump. Jump instructions don’t involve the ALU, so the ALU control signals are unimportant and are shown in gray.

Notice that some gray control signals are labeled “dc” for don’t care, but others have a 0 or 1 value. This is because I’ve chosen specific values for those dc’s in order to accomplish another goal. Look carefully, and you’ll see that /carryIn is everywhere equal to i3, and M is everywhere equal to i2. That means /carryIn and M don’t really need to be in the microcode at all: anything that needs them can just connect to i3 and i2 instead. This idea could probably be extended to more bits, but I stopped at two. If you look at the circuit schematic I posted previously, you’ll see that I’m not currently taking advantage of this, but it could be helpful later if I find a need to add another control signal or two.

Instruction Set

With only 4 bits for the instruction opcode, there’s a maximum of 16 different instructions, so the choice of which instructions to implement is challenging. It would be possible to have more than 16 instructions, but only by shrinking the address size or immediate operand size. Since this is a 4-bit CPU, a 4-bit instruction size just makes sense.

There are five types of jump instructions:

JMP – unconditional jump
JC – jump if carry
JNC – jump if no carry
JE – jump if equal
JNE – jump if not equal

This is a good set of jump instructions for convenient programming. Although they’re convenient, the negative jumps JNC and JNE aren’t absolutely necessary, because you can always rewrite them with a positive jump and an unconditional jump:

JNC noCarry
   do stuff
   do other stuff


JC carry
JMP noCarry
   do stuff
   do other stuff

LD and ST load a nibble from data space (RAM) to the accumulator, or store a nibble from the accumulator to data space.

LIT loads a literal immediate nibble value to the accumulator. It might also have been named LDI.

OUT stores a nibble from the accumulator to one of the eight output ports, while IN loads a nibble from one of the eight input ports to the accumulator. These instructions are intended to be used for I/O.

ADDI adds a literal immediate nibble value to the accumulator, while ADDM adds a nibble in data space to the accumulator. In either case, the carry-in for the addition is 0, and the carry-out and equal flags are set by the results of the addition.

CMPI compares a literal immediate nibble value to the accumulator (subtracts the value from the accumulator), while CMPM compares a nibble in data space to the accumulator. The accumulator value is not modified, but the carry-out and equal flags are set by the results of the subtraction. Because the ALU is performing a subtraction, the carry-in is always 1 in order to make the math work out correctly.

NORI performs a NOR with an immediate value, NORM performs a NOR with a data space value. The carry-out and equal flags are set by the results of the NOR.
Synthetic Instructions

What about the instructions that aren’t here, like subtraction, AND, OR, NOT? Happily, all of these can be synthesized from the existing instructions. The assembler I’m working on includes some of these as macros, so they can be used as if they were built-in instructions.

ORI x: NORI x; NORI #0
ORM x: NORM x; NORI #0
ANDI x: NORI #0; NORI ~x
SUBI: ADDI (~x+1)

The only common instructions that can’t be trivially synthesized are ANDM and SUBM, because they require the use of a temporary register to hold the complement of the data space value. A dedicated location in data space could be used for this purpose, since Nibbler only has one register.

In many cases, with some thought you can eliminate the need for these synthetic instructions entirely. Take the common case of checking a specific bit in an input. On a CPU with an ANDI instruction, this might look like:

#define BIT2 $4 ; $4 = 0100
IN #0 ; load accumulator with nibble from port 0
JNE bit2is1

With Nibbler, you might think to rewrite this using the synthetic ANDI mentioned above:

#define NOT_BIT2 $B ; $B = 1011
IN #0 ; load accumulator with nibble from port 0
JNE bit2is1

But this can be shortened by simplifying the two negatives NORI and JNE into a single positive JE:

#define NOT_BIT2 $B ; $B = 1011
IN #0 ; load accumulator with nibble from port 0
JE bit2is1

More to Come

Next time I’ll post more about the software tools I’ve written for simulating Nibbler and assembling programs. Until then, questions and comments are always welcome!

Read 14 comments and join the conversation 

Custom 4-Bit CPU Schematic and Control

Enough with the vague design talk – here’s the circuit schematic for the Nibbler 4-bit CPU! Click the image to zoom in to a full size view. The whole system fits on a single page, including the CPU itself and the I/O devices, so it’s easy to wrap your head around.

Except for RAM and ROM, all the chips shown here are common 7400 series parts. I haven’t selected a logic family yet, but most likely they’ll be 7400HC or 7400HCT, which require less power while offering similar speed to the more common 7400LS family.

Program Data

The parts on the schematic are arranged in the same relative positions as in the architecture diagram from my previous post. At the middle-right is the program ROM, where the currently running program is stored. This is an 8Kx8 EEPROM, but Nibbler’s address size only allows for 4K programs, so one of the address inputs is unused and is hard-wired to 0. Program memory is 8 bits wide, and so all 8 of the ROM’s I/O lines are used. Depending on the type of instruction, these may be 4 bits of instruction opcode and 4 bits of immediate operand, or 4 bits of instruction opcode and 4 bits of address, followed by 8 more bits of address. At the start of execution of each instruction, this program byte is loaded into the Fetch register.

The address of the program instruction that’s currently being executed is stored in the program counter. The PC consists of three ‘163 4-bit counters, chained together to make a 12 bit logical register. After most instructions, the PC will increment to point to the next instruction. For jump instructions, the PC can also be loaded with a new address.  The address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits).

Control and Microcode

At the top left of the schematic are the three chips pertaining to the execution of the current instruction. The Fetch register is a ‘377, an 8-bit register that holds the current instruction opcode in the high 4 bits and instruction or address data in the low 4 bits. ALU flags are stored in the 4-bit Flags register, a ‘173. There are only two flags, carry and equal, so two of the four bits are unused. The last chip in this group is a ‘175, a quad flip-flop. One flip-flop is used to synchronize the reset signal, and another is the Phase bit, which constantly toggles between 0 and 1 to indicate which of the two clock cycles of an instruction’s execution is currently underway. Fetch is loaded at the end of the clock cycle when Phase is 0. The other two flip-flops are unused.

With two chips that are only half-used, is there a way to combine the functions of the ‘173 and the ‘175 into a single chip? Probably not: flip-flops load data on every clock, but the ‘173 needs a load enable for the ALU flags.

The instruction opcode, ALU flags, and phase are combined to form a 7-bit address for the two microcode ROMs, shown at the mid-left. The output of the two ROMs constitutes the 16 control signals needed to orchestrate the behavior of all the other chips. The microcode is stored in two 2Kx8 EEPROMs, so four of the eleven address inputs on each ROM are unused and hard-wired to 0.

ALU Datapath

At the bottom-left of the schematic are the ‘181 ALU and the ‘173 accumulator register “A”. The ALU (arithmetic and logic unit) can perform any common arithmetic or logical operation on its two inputs. In this case, one input always comes from the accumulator, while the other is supplied from the data bus. The ALU result is stored back into the accumulator. The ALU, accumulator, and data bus are all 4 bits wide, which is what makes Nibbler a 4 bit CPU.

Carry-In and Carry Flag

If you look carefully, you’ll see that the ALU’s carry-in bit is a control signal provided by microcode, not the carry flag from the Flags register. This is a subtle but important point: the carry flag is an output from an arithmetic instruction, and can be used to make a conditional jump if the carry flag is/isn’t set, but it doesn’t feed back into the ALU to affect later calculations. This means that when performing multi-nibble pair-wise additions, the program must check the carry flag after each nibble addition, and add an extra 1 into the next addition if it’s set.

This was a conscious design choice. If the carry flag did connect to the ALU’s carry-in bit, then the program would need to clear it before performing any single-nibble additions, and those are much more common than multi-nibble additions. Also the carry-in bit can’t simply be hard-wired to 0, because as you’ll see later, the CMP (compare) instruction requires carry-in to be 1 in order to work properly. So carry-in must be provided by the microcode.


RAM is shown at the bottom-center. Its I/O lines are connected to the data bus, and the address comes from the Fetch register operand value (highest 4 bits) and the program ROM byte (lowest 8 bits). Ideally the system would use a 4Kx4 SRAM, to match Nibbler’s address size and data width, but the closest match I could readily find was a 2Kx8 SRAM. That means there will only be 2048 addressable nibbles instead of 4096, and half of the RAM I/O lines will be unused.

Notice the CLK signal is connected to the RAM’s /CE (chip enable) input. This means the RAM will only be enabled during the second half of each clock cycle. This is a simple way of preventing erroneous writes to RAM during the early part of the clock cycle, when the /WE (write enable) signal and RAM address may not yet be valid.

IN and OUT Ports

The IN and OUT ports are also connected to the data bus, and are shown on the schematic at bottom-right. IN0 is a ‘125 4-bit bus driver, which outputs the state of four pushbuttons connected to pull-up resistors. Because there’s only a single IN port, no decoding of the port number is done, and this ‘125 will actually respond to any port number with the IN instruction. If more IN ports were added, then additional port number decoding logic would be needed.

The two OUT ports are ‘173 4 bit registers. OUT1 connects to databus[4..7] of a 16×2 character LCD display using the common HD44780 controller. Although this LCD controller has an 8 bit interface, it can also operate in 4 bit mode, in which case only the highest 4 LCD databus lines are used. OUT0 connects two more lines to the LCD, for the RS and E signals needed to control LCD data transfers. The other two lines from OUT0 connect to an LED, which can be toggled on/off as a basic debugging aid, and to a speaker, which can be bit-banged in software to generate simple square-wave tones at different frequencies.

Notice that the ‘173s have two load enable inputs, /G1 and /G2, and both must be low in order to load data to the chip. /G1 of both chips is connected to the /LOADOUT control signal. But as with the IN port, the OUT port number is not fully decoded, in order to avoid needing extra decoding logic. Instead, bit 0 of the port number is connected to OUT0 /G2, and bit 1 to OUT1 /G2. This means that OUT0 will actually respond to any port number where bit 0 is 0, and OUT1 to any port number where bit 1 is 0. It would even be possible to load both OUT ports simultaneously by using a port number where both bits 1 and 0 were 0, although that probably wouldn’t be useful.

Bus Drivers

The last two components on the data bus are a pair of 4-bit bus drivers, shown at the center and at the bottom-center of the schematic. These are two halves of a single ‘244 octal driver. One drives the ALU result onto the data bus, which is necessary when storing data to RAM or an OUT port. The other drives the operand value from the Fetch register onto the data bus, which is necessary for instructions that involve an immediate constant value.

More to Come

Next time I’ll post more details about the control signals, microcode, and instruction set. Until then, questions and comments are always welcome!


Read 15 comments and join the conversation 

Hello Nibbler!

Say hello to Nibbler, the 4-bit homemade CPU! Ever since I built BMOW1, people have written to me asking how to make their own homebrew computers. BMOW is a complex design that can be difficult to comprehend, so I decided it was time to create a minimal CPU that’s easy to understand, easy to build, but still capable of running interesting programs. Ideas for Nibbler began percolating in my brain, and after a few weeks of pencil sketches and hand simulation, it’s finally ready to share. And if you’ve forgotten, a nibble is half a byte or 4 bits, so the name fits the CPU.

Some of you may be thinking: “4-bit CPU? BORING!” I agree that many of the 4-bit CPU designs on the web aren’t very exciting, though that’s not an inherent problem with their 4-bitness, but is caused by shortcomings in the computer that surrounds the CPU. Most designs are limited to 256 nibbles of memory, which just isn’t enough to fit a program that does anything very interesting. I/O is often limited to basic LEDs and switches, further reducing the scope of what’s possible.

My goals for Nibbler are:

  • Only use commonly-available 7400 series chips and RAM/ROM. No programmable logic or other goodies.
  • Keep the total number of chips as few as possible.
  • Employ a simple, straightforward design that’s easy to understand.
  • Maintain a clean logical separation between the CPU and the computer surrounding it.
  • Run interesting, interactive programs involving several I/O devices.
  • NOT: Be the most powerful CPU, or the easiest to write programs for.



The architecture of Nibbler is shown above. The CPU core is just eleven 7400 series chips, plus the clock crystal. RAM and ROM add two more chips, and peripheral I/O in “the computer” adds three more, for a total of sixteen chips overall. Compared to BMOW’s 65 chips and multiple clocks, that’s very lightweight.

Instruction opcodes are 4 bits wide, which allows for 16 possible types of instructions. All instructions require exactly two clock cycles to execute. During the first clock cycle, called phase 0, the instruction opcode and operand are retrieved from memory and stored in a register called Fetch. The second clock cycle, called phase 1, performs the calculation or operation needed to execute the instruction.

A pair of microcode ROMs is used to generate the sixteen internal control signals needed to load, enable, and increment the other chips in the CPU at the appropriate times. The microcode ROM address is formed from the instruction opcode, the phase, and the CPU carry and equal flags. Each microcode ROM outputs a different group of eight of the sixteen total control signals.

A load-store design is used, with all arithmetic and logical computation results being stored into the single 4-bit accumulator register named “A”.  Data can be moved between A and memory locations in RAM, but otherwise all the CPU instructions operate only on A. This greatly simplifies the hardware requirements, at the cost of some decrease in flexibility when writing programs.

In contrast to most modern CPUs, the Nibbler design uses a Harvard Architecture. That means programs and data are stored in separate address spaces, and travel on separate busses. The data bus is 4 bits wide, as one should expect for a 4-bit CPU. The program bus is 8 bits wide: 4 bits for the instruction opcode, and 4 bits for an immediate operand.

Program and data addresses are both 12 bits wide, resulting in total addressable storage of 4096 bytes for programs and 4096 nibbles for data. A 12 bit program counter holds the current instruction address. Since instruction opcodes are 4 bits wide, that makes instructions involving absolute memory addresses 4 + 12 = 16 bits in size, or two program bytes.

Nibbler is notable for a few things it does NOT have. There’s no address decoder, because there’s not more than one chip mapped into different regions of the same address space. Program ROM occupies all of the program address space, and RAM occupies all of the data address space. As you’ll see later, I/O peripherals aren’t memory-mapped, but instead use port-specific IN and OUT instructions to transfer data.

Nibbler also lacks any address registers, which means it can’t support any form of indirect addressing, nor a hardware-controlled stack. All memory references must use absolute addresses. That’s a significant limitation, but it’s in keeping with the project’s K.I.S.S. design goals. With the use of jump tables and dedicated memory locations, a simple call/return mechanism can be implemented without a true stack.

Up to sixteen distinct I/O devices can be supported by the CPU, but the planned I/O devices require just one IN port and two OUT ports. The computer’s input comes from four momentary pushbuttons, arranged in a left/right/select/back cross configuration, and connected to the IN port. Output utilizes one of the two OUT ports, and includes the obligatory LEDs used for debugging, as well as a piezo speaker for software-controlled sound, and a two-line character-based LCD display.

The specific 7400 logic family and chips to be used aren’t yet finalized, but in back of the envelope calculations, it looks like the CPU should support a speed of just over 4 MHz. The longest path is for a write to RAM during phase 1: Clock-to-Q delay for the Fetch register, plus propagation delay for the microcode ROMs, ALU, and bus driver, plus data setup time for the RAM. At two clock cycles per instruction, 4 MHz operation would result in 2 MIPS, which is the same or better than BMOW.

I’ll write more about the instruction set and programming model next time. Until then, if you have any comments or questions, I’d love to hear them!

Read 17 comments and join the conversation