BMOW title
Floppy Emu banner

Raspberry Pi GPIO Programming in C

The Raspberry Pi’s 40-pin GPIO connector often gets overlooked. Typical Pi projects use the hardware as a very small desktop PC (RetroPie, Pi-hole, media center, print server, etc), and don’t make any use of general-purpose IO pins. That’s too bad, because with a little bit of work, the Raspberry Pi can make a powerful physical computing device for many applications.

 
Raspberry Pi vs Arduino (and other microcontrollers)

Why would you want to use a Raspberry Pi instead of an Arduino or other microcontroller (STM32, ATSAM, PIC, Propeller)? There are loads of “Raspberry Pi vs Arduino” articles on the web, and in my view almost all of them miss the mark. The Pi is not a better, more powerful Arduino. It’s a completely different type of device, better at some tasks, but markedly worse at others.

The Pi is vastly more powerful than something like an Arduino Uno. The latest Pi 3 Model B+ has an 88x faster CPU clock and 500,000x more RAM than the Uno. It also runs a full-fledged Linux operating system, so it’s much easier to create projects involving high-level functions like networking or video processing. And you can connect a standard keyboard, mouse, and monitor, and use it as a normal computer.

But the Pi operating system is also a huge weakness in many applications. There’s no “instant on”, because it takes nearly a minute for the device to boot up. There’s no appliance-like shutoff either – the Pi must be cleanly shutdown before power is turned off, or else the operating system files may get corrupted. And real-time bit twiddling of GPIO is mostly impossible, because the kernel may swap out your process at any moment, making precise timing unpredictable.

In theory it’s possible to do bare-metal programming on the Raspberry Pi, eliminating Linux and its related drawbacks for real-time applications. Unfortunately this doesn’t seem to be a common practice, and there’s not much information available about how to do it. So the Pi is probably best for those applications where you need some major CPU horsepower and some kind of GPIO connection to other sensors or equipment, but don’t need precise real-time behavior or microsecond-level accuracy.

 
GPIO and Python?

If you start Googling for “Raspberry Pi GPIO programming”, you’ll quickly discover that most of the examples use the Python language. In fact, this seems to be the most popular way by far to use GPIO on the Pi.

I have nothing against Python, but I’m a C programmer through and through, and the idea of using a high-level language for low-level digital interfaces is unappealing. By one measure, Python is over 300x slower at Raspberry Pi GPIO manipulation than plain C. I’m sure there are applications where it’s OK to throw away 99.7% of potential performance, but I’ll be sticking with C, thank you very much.

I spent a little time researching four different methods of Raspberry Pi GPIO manipulation in C. This involved reading documentation and data sheets, and examining the source code of various libraries. I haven’t yet tried writing any code using these methods, so take my impressions accordingly.

If any of the authors of these C libraries happen to read this – thank you for your work, and please don’t be offended by any criticisms I may make. I understand that creating an IO library necessarily involves many tradeoffs between simplicity, speed, flexibility, and ease of use, and not everyone will agree on the best path.

 
Direct Register Control – No Library

The GPIO pins on the Raspberry Pi can be directly accessed from C code, similarly to how it’s done on the ATMEGA or other microcontrollers. A few different memory-mapped control registers are used to configure the pins, and to read input and set output values. The only big difference is that the code must first call mmap() on /dev/mem or /dev/gpiomem, to ask the kernel to map the appropriate region of physical memory into the process’s virtual address space. If that means nothing to you, don’t worry about it. Just copy a couple of dozen lines of code into your program’s startup routine to do the mmap, and the rest is fairly easy.

Here’s an example of reading the current value of GPIO 7:

if (gpio_lev & (1<<7))
  // pin is high
else
  // pin is low

Just test a bit at a particular memory address - that's it. This looks more-or-less exactly like reading GPIO values on any other microcontroller. gpio_lev is a memory-mapped register whose address was previously determined using the mmap() call during program initialization. See section 6 of the BCM2835 Peripherals Datasheet for details about the GPIO control registers.

Setting the output value of GPIO 7 is similarly easy:

gpio_set |= (1<<7); // sets pin high

gpio_clr |= (1<<7); // sets pin low

Using other control registers, it's possible to enable pull-up and pull-down resistors, turn on special pin functions like SPI, and change the output drive strength.

Watch out for out-of-order memory accesses! The datasheet warns that the system doesn't always return data in order. This requires special precautions and the use of memory barrier instructions. For example:

a_status = *pointer_to_peripheral_a; 
b_status = *pointer_to_peripheral_b;

Without precautions, the values ending up in the variables a_status and b_status can be swapped. If I've understood the datasheet correctly, a similar risk exists for GPIO writes. Although data always arrives in order at a single destination, two different updates to two different peripherals may not be performed in the same order as the code. These out-of-order concerns were enough to discourage me from trying direct register IO with my programs.

 
Wiring Pi

WiringPi wraps the Raspberry Pi GPIO registers with an API that will look very familiar to Arduino users: digitalRead(pin), digitalWrite(pin, value). It's a C library, but third parties have added wrappers for Python and other high-level languages. From a casual search of the web, it looks like the most popular way to do Raspberry Pi GPIO programming in C.

WiringPi appears to be designed with flexibility in mind, at the expense of raw performance. Here's the implementation of digitalRead():

int digitalRead (int pin)
{
  char c ;
  struct wiringPiNodeStruct *node = wiringPiNodes ;

  if ((pin & PI_GPIO_MASK) == 0)		// On-Board Pin
  {
    /**/ if (wiringPiMode == WPI_MODE_GPIO_SYS)	// Sys mode
    {
      if (sysFds [pin] == -1)
	return LOW ;

      lseek  (sysFds [pin], 0L, SEEK_SET) ;
      read   (sysFds [pin], &c, 1) ;
      return (c == '0') ? LOW : HIGH ;
    }
    else if (wiringPiMode == WPI_MODE_PINS)
      pin = pinToGpio [pin] ;
    else if (wiringPiMode == WPI_MODE_PHYS)
      pin = physToGpio [pin] ;
    else if (wiringPiMode != WPI_MODE_GPIO)
      return LOW ;

    if ((*(gpio + gpioToGPLEV [pin]) & (1 << (pin & 31))) != 0)
      return HIGH ;
    else
      return LOW ;
  }
  else
  {
    if ((node = wiringPiFindNode (pin)) == NULL)
      return LOW ;
    return node->digitalRead (node, pin) ;
  }
}

That's a lot of code to accomplish what could be done by testing a bit at an address. To be fair, this code does a lot more, such as an option to access GPIO using sysfs (doesn't require root?) instead of memory-mapped registers, and pin number remapping. It also adds a concept of on-board and off-board pins, so that pins connected to external GPIO expanders can be controlled identically to pins on the Raspberry Pi board itself.

From a brief glance through the source code, I couldn't find any use of memory barriers. I'm not sure if the author determined that they're not necessary somehow, or if out-of-order read/writes are a risk.

WiringPi also includes a command line program called gpio that can be used from scripts (or interactively). It won't be high-performance, but it looks like a great tool for testing, or for when you just need to switch on an LED or something else simple.

 
pigpio

pigpio is another GPIO library, and appears more geared towards simplicity and speed. And yes, it was quite a while before I recognized the name was Pi GPIO, and not Pig Pio. 🙂

Here's pigpio's implementation of gpioRead():

#define BANK (gpio>>5)
#define BIT  (1<<(gpio&0x1F))

int gpioRead(unsigned gpio)
{
   DBG(DBG_USER, "gpio=%d", gpio);

   CHECK_INITED;

   if (gpio > PI_MAX_GPIO)
      SOFT_ERROR(PI_BAD_GPIO, "bad gpio (%d)", gpio);

   if ((*(gpioReg + GPLEV0 + BANK) & BIT) != 0) return PI_ON;
   else                                         return PI_OFF;
}

Here there's no pin number remapping or other options. The function does some error checking to ensure the library is initialized and the pin number is valid, but otherwise it's just a direct test of the underlying register.

As with WiringPi, I did not see any use of memory barriers in the source code of pigpio.

 
bcm2835

bcm2835 is a third option for C programmers looking for a Raspberry Pi GPIO library. It appears to have the most thorough and well-written documentation, but also seems to be the least commonly used library of the three that I examined. This may be a result of its name, which is the name of the SoC used on the Raspberry Pi. It's somewhat difficult to find web discussion about this library, as opposed to the chip with the same name.

Like pigpio, bcm2835 appears more focused on providing a thin and fast interface to the Pi GPIO, without any extra options. Here's the implementation of bcm2835_gpio_lev(), the oddly-named read function:

uint32_t bcm2835_peri_read(volatile uint32_t* paddr)
{
    uint32_t ret;
    if (debug)
    {
		printf("bcm2835_peri_read  paddr %p\n", (void *) paddr);
		return 0;
    }
    else
    {
       __sync_synchronize();
       ret = *paddr;
       __sync_synchronize();
       return ret;
    }
}

uint8_t bcm2835_gpio_lev(uint8_t pin)
{
    volatile uint32_t* paddr = bcm2835_gpio + BCM2835_GPLEV0/4 + pin/32;
    uint8_t shift = pin % 32;
    uint32_t value = bcm2835_peri_read(paddr);
    return (value & (1 << shift)) ? HIGH : LOW;
}

The pin number is constrained to the range 0-31, but otherwise there's no error checking. The actual read of the GPIO register is performed by a helper function that includes memory barriers before and after the read.

 
Impressions

For my purposes, I would probably choose pigpio or bcm2835, since I prefer a thin API over one with extra features I don't use. Of those two options, I'd tentatively choose bcm2835 due to the format of its documentation and its use of memory barriers. I wish I understood the out-of-order risk better, so I could evaluate whether the apparent absence of memory barriers in the other libraries is a bug or a feature.

Any analysis that looks at just a single API function is clearly incomplete - if you're planning to do Rasbperry Pi GPIO programming, it's certainly worth a deeper look at the many other capabilities of these three libraries. For example, they differ in their support for handling interrupts, or byte-wide reads and writes, or special functions like SPI and hardware PWM.

Did I miss any other C programming options for Raspberry Pi GPIO, or overlooked something else obvious? Leave a note in the comments.

Read 10 comments and join the conversation 

10 Comments so far

  1.   May 27th, 2018 8:49 pm

    About your first example of memory access order (“Watch out for out-of-order memory accesses!”): Are you sure that the contents of a_status and b_status can be swapped? Sure, the order of the actual accesses may be swapped, so peripheral_b may be read before peripheral_a, but if the contents got swapped, that would be completely broken IMHO.

    Another thing to watch out for is the ordering between read and write accesses, which is also not always guaranteed.

  2. Steve May 27th, 2018 9:04 pm

    That example with a_status and b_status was copy-pasted directly from the Broadcom datasheet, page 7. I agree it seems very strange.

  3. Tim May 28th, 2018 5:03 am

    Haven’t run across the swap problem as my projects tend to be small. I do like the “no library” or “custom library” approach as it minimizes what needs to be changed when porting code from (example) the RPi to a Cubietruck or BBB.

    One big annoyance is with the spec sheets themselves. Some are only available in Chinese or German. Some assume that you already know the shortcomings of the chip. If you’re working with someone’s breakout board, you have to dig to find out what parts of the spec sheet are N/A because the builder specifically hardwired them out (e.g., most of the available FM receiver breakout boards only support I2C (no SPI)). Etc.

  4. Steve May 28th, 2018 8:00 am

    Yes, and speaking of spec sheets, I was surprised by the number of typos and general informality of the Broadcom datasheet for the BCM2835 peripherals (see link in the original post). Every other datasheet I’ve ever read was a very professional document, but this one is full of grammar errors like “each bank has its’ own interrupt line” and “it is theoretical possible”, as well as chatty side-comments like “Not a good idea!” I could understand if the grammar issues were English translations problems, but Broadcom is a US company, and the datasheet reads a bit like a hastily-written college term paper.

    As for datasheets that are only available in other languages, I’ve found that Google Translate does a surprisingly good job translating technical datasheets from Chinese. I was recently reading one such translated datasheet that kept referring to the “caterpillar effect”, which I assumed was some kind of amusing translation error. But in reality it’s a visual artifact of LED matrix refresh, and “caterpillar effect” is exactly what it’s called in English. Score Google Translate 1, me 0.

  5. asdf May 28th, 2018 11:35 pm

    If peripherals are mmap’ed as regular memory (eg. via /dev/mem), you will get all the joys of caching and weak ordering. So memory accesses can indeed be reordered, you may end up reading/writing more memory locations than intended and so forth. There’s unfortunately no flag for mmap to map the memory as I/O (uncacheable, strongly ordered), you need a kernel driver for that. If there’s no existing driver, the Linux UIO driver lets you define a generic device in the device tree, without writing any code.

  6. John May 29th, 2018 3:10 am

    “if the contents got swapped, that would be completely broken IMHO”

    I agree.

    Normally, out-of-order memory access just means that the example code might get peripheral a’s data from a later time than peripheral b’s, despite reading from it first. Most of the time that’s not a problem, so you don’t bother with barriers.

    But not in this case (if I’m reading the datasheet correctly, and it’s insane enough that I still have a small hope that it has just been badly translated). Peripheral access goes over some special bus, and accesses of different peripherals aren’t guaranteed to be in order. In the example, a_status might end up with peripheral b’s data.

    Data for a single peripheral stays in order, so as long as you stick with a single peripheral you’re OK. But access another, and you must have a barrier. And hope that the person who wrote any interrupt handlers that might be active at the time put barriers in too.

  7. Steve May 29th, 2018 7:49 am

    This out-of-order GPIO stuff for RPi is fascinating and bewildering, and I’ve been reading more about it. Thanks to asdf and John May as well. Some tentative conclusions:

    1. The GPIO memory is not cached (at least not for reads). If it were, correct use of the GPIO pins would be impossible. All subsequent reads of a GPIO pin’s state would return the cached value from the first read, instead of the current pin state.

    2. The potential for out-of-order reads getting assigned to the wrong registers is real (the a_status and b_status example above), but exists only when reading from two different peripherals (like GPIO and a hardware Timer/Counter or UART). GPIO is a single peripheral, so out-of-order reads aren’t a problem and memory barriers aren’t generally necessary for code that only uses GPIO. Similarly, out-of-order writes aren’t a problem either when writing strictly to GPIO.

    My mental model (which could be totally wrong) is that each peripheral maintains a FIFO of read requests and write requests from the ARM core, and the FIFO entries are always handled in order. But the different peripherals have their own independent FIFOs, and there’s no guarantee of ordering across them. The ARM core might issue a read request to peripheral A, then a different read request to peripheral B, and eventually a result will be returned to the core, but it won’t know if the result came from A or B.

  8. Loïc June 2nd, 2018 6:59 am

    A good intro to memory barriers and other troubles with compiler and concurrency is the following paper:
    What every systems programmer should know about concurrency, Matt Kline
    https://bitbashing.io/concurrency-primer.html

  9. Steve June 2nd, 2018 11:10 am

    I must admit I still don’t understand it. That paper talks almost exclusively about problems arising from multi-threaded code. What I don’t understand how is such concurrency problems can occur in a single thread, running on a single core. Section 7 of the paper does mention “weakly ordered hardware” but does nothing to explain it at the hardware level, or to describe what problems might arise if memory barriers aren’t used.

    I can understand that if I write some single-threaded code like:

    int foo = *pFoo;
    int bar = *pBar;

    those two assignments might not happen in program order. Bar might actually get loaded from memory before Foo, but that doesn’t really matter. The same goes for writes, where Bar might be written before Foo, but single-threaded code doesn’t care.

    What I can’t understand is how the above code could produce the *wrong result*, with the value at pBar somehow getting assigned to the variable Foo. That’s what the Broadcom datasheet seems to say could happen, but I can’t visualize what kind of hardware design would make that possible. It just seems completely broken. Programming in such an environment seems like it would be virtually impossible, without having to surround every single statement with a memory barrier.

  10. Steven Clark June 3rd, 2018 8:18 pm

    If you really need speed and access to memory mapped IO registers you should probably just build an upper portion of your application into a kernel module.
    mach/platform.h provides the GPIO_BASE address for whichever version of the SOC your using. And the high resolution timer interface should let you get by without figuring out how the interrupt vector’s being used for some applications. It’s certainly better than running on busy loops or jiffy-precision timers (unless you can get a DMA system driving your peripheral registers, I haven’t yet, maybe in the future)

    linux/miscdevice.h makes it easy to get one or more character devices in /dev

    Breaking the whole permission system to give a user space process access to all of memory kinda voids the whole idea of having an OS or security of any sort.

Leave a reply. Comments may not be monitored regularly. For product support questions, visit the Contact page.