/var/log/andrey     About     Archive     Feed

Button (or other GPIO pin) debouncing

(this is mostly a repost from back in 2011 on my old blog but I keep referring to it so I wanted it in an easier place to find)

GPIO pin de-bouncing is a fairly common task and there are many good ways to implement it. Here’s how I handle it on most projects, I think that it’s fairly clean and easy to adapt to small microcontrollers or even some larger systems.

Each pin that needs to be sampled and debounced can be represented with a state machine comprised of a state, value, and counter. A pin is either idle (whether pressed or released) or in transition to being pressed or released. To transition to the idle state, the pin must maintain the same level for a number of counts.

A pin can therefore be represented something like:

enum pin_state {
        PIN_IDLE       = 0,
        PIN_PRESSING   = 1,
        PIN_RELEASING  = 2,

struct pin {
        enum pin_state state;
        char pressed;
        unsigned char debounce;
        unsigned char debounce_threshold;

I use three states but with the combination of ‘state’ and ‘pressed’ and ‘debounce’ we really have four real states (idle-pressed, idle-released, pressing, and releasing).

At initialization time, the pin structure(s) should be set to zero. The ‘pin’ structure should also contain information about the pin to enable a routine to check its value (for example the GPIO port and pin number). We then poll the pin or pins in a thread or main loop. For example, to check just one pin:

static struct pin;

void init(void)
        /* (if needed) */
        memset(&pin, 0, sizeof(pin));
        /* pick some reasonable threshold, this is a
           factor of your circuit and polling
           frequency */
        pin.debounce_threshold = 10;

void check_pins(void)
        /* invert this if the pin is active-low, as is common
           for buttons, we treat a '1' as 'active' */
        char cur = gpio_get_pin_value();

        switch (pin.state) {
              case PIN_IDLE:
                     if (cur != pin.pressed) {
                            pin.state = cur ?
                                    PIN_PRESSING : PIN_RELEASING;
              case PIN_PRESSING:
                     if (cur) {
                     } else {
                            pin.debounce = 0;
                            pin.state = PIN_IDLE;
              case PIN_RELEASING:
                     if (cur) {
                            pin.debounce = 0;
                            pin.state = PIN_IDLE;
                     } else {

       if (pin.state > PIN_IDLE &&
                      pin.debounce > pin.debounce_threshold) {
              /* report the pin press or release */
              /* and now the pin is idle */
              pin.state = PIN_IDLE;
              pin.debounce = 0;
              pin.pressed = cur;

If there are multiple pins to check then I would replace the single struct pin with an array and loop over them. In that case struct pin should contain pin port and pin number information for your implementation of gpio_get_pin_value().

When debouncing a physical button we generally shoot for around 50ms (that is, a level change below 50ms is filtered out). We usually wind up with 47ms when debouncing with an RC circuit (typical values are a 4.7k Ohm resistor and a 100nF capacitor making RC equal to 47ms) so either one seems like a good target.

The very generic state machine described above just uses a counter and this can be calculated or experimentally calibrated to the system based on the behavior of that system’s GPIOs. If there is a time source available, for example a timer block or an RTC with reasonable precision, this counter be replaced with a time stamp and it’s much easier to set the time threshold for debouncing.

Developing STM8 boot code with SDCC

I’m using the open source SDCC toolchain to develop an application for the STM8 microcontroller and part of that requires a custom bootloader (what ST’s manuals refer to as User Boot Code or UBC) and application firmware. Here are some notes on how to use SDCC and stm8flash to develop and flash the bootloader and application.

The UBC concept itself is mostly a convention on STM8. The hardware does not do much with it aside from treating the UBC area of flash as write-protected (the idea is that boot code is not field-upgradeable in a typical product whereas we may wish to reflash the application firmware).

Boot process and interrupts

The STM8 uses option byte 1 to determine the size of the UBC (it’s 0 by default meaning there is no boot code). Setting this to a non-zero size reserves a portion of flash (starting at 0x8000) for the UBC. For example setting byte 1 to 4 reserves four 256 byte blocks or 1KB for the UBC.

It is up to the boot code to jump to the application. The STM8 CPU assumes that the interrupt vector table is located at 0x8000 so this single table must be shared between the UBC and the application.

Interrupt vector table

In SDCC, interrupt vectors look like:

void usart1_rx_irq(void) __interrupt(28)

Interrupt vectors must be implemented in the same translation unit (file) that implements main() and the __interrupt attribute is used to specify their IRQ number (which becomes an offset into the interrupt vector table).

The interrupt vector table is placed at the start of the program (0x8000 by default, or whatever is set by using the --code-loc option). The UBC will be placed at 0x8000 along with its vector table but the application needs to be placed after the UBC (starting with its own vector table). So for a 1KB UBC (that is, option byte 1 is set to 4) we would build the application firmware with --code-loc=0x8400 and we know that the interrupt vector table for the application is at 0x8400.

The STM8 manual shows the offsets from the start of the vector table for each interrupt handler. For the above interrupt 28, the offset is 0x78. Assuming the boot code does not need to do anything with interrupt 28, we could simply redirect to the application firmware’s implementation of that interrupt handler. That is, in the UBC we would have something like:

void usart1_rx_irq(void) __interrupt(28)
    __asm jpf 0x8478 __endasm;

To connect the real interrupt 28 to the redirected handler in the application’s table. The application’s implementation of interrupt 28 would do whatever it is that is appropriate for handling that interrupt.

The UBC should redirect every single interrupt to the right location to provide equivalent functionality in the application. The table itself contains 4-byte entries:

  • 0x00: the reset vector
  • 0x04: trap handler
  • 0x08: interrupt 0 (unused)
  • 0x0C: interrupt 1 (FLASH)
  • 0x10: interrupt 2 (DMA 0/1)

…and so on. As such, interrupt 28 is 4 * 28 + 8 or 120 which gives us the offset 0x78 and, if the application starts at 0x8400 the redirected vector is at 0x8478.

We can calculate some offsets and addresses in code (or the preprocessor) to make life easier.


To flash the MCU:

  • get the current option bytes content from the MCU
  • modify that content to set the UBC size (and any other changes needed)
  • write back the modified option bytes
  • write the bootloader
  • write the application

Using the STM8 discovery board (with STM8L151), we can read the option bytes from the MCU,

stm8flash -c stlink -p stm8l151?6 -s opt -r opt.bin

Then edit opt.bin as needed. Note that byte 0 must always be set to 0xAA to keep the SWIM protocol usable. To write it back:

stm8flash -c stlink -p stm8l151?6 -s opt -w opt.bin

To write the bootloader, boot.ihx to the default location (0x8000):

stm8flash -c stlink -p stm8l151?6 -w boot.ihx

And then to write the application to (for example) 0x8400:

stm8flash -c stlink -p stm8l151?6 -s 0x8400 -w fw.ihx

Self-programming the Flash

Warning: this is going to get very hacky!

One of the typical tasks of a bootloader is accepting a new application to write into the application section of the Flash. There isn’t much Flash on a typical STM8 microcontroller so we’re likely to implement a scheme that involves:

  • the application is told to jump to boot code and accept a firmware update
  • the boot code starts and, rather than jumping to the application, waits for new firmware
  • new firmware is received and written to the Flash, block by block
  • having completed the reflashing process, the boot code is told to boot the new application

That’s generally easy enough to implement but on STM8 (and many other parts) an efficient block-oriented Flash operation requires us to be executing from RAM rather than in-place in the Flash. This requires us to place the actual code that performs the erase and write operation into RAM as well as the block of data that we wish to write.

Unlocking the Flash

The STM8 program flash is unlocked with the following sequence:


The Flash can be erased and written after that.

Calling a RAM function

Unfortunately SDCC is missing linker features to help us do this. We cannot tell the linker to place a routine in RAM and, to make matters worse, we cannot use a linker-derived symbol in our C code to implement a simple memcpy() of the function in question from Flash to RAM, nor can we learn the length of the function (all things that GCC and proprietary stm8 toolchains can do).

I decided to work around this limitation by taking my own hacky approach:

  • write a dummy program that implements my RAM function, flash_write_block and compile it with SDCC like any other C program
  • use the assembly output to locate the implementation of flash_write_block and retrieve the sequence of bytes (machine instructions) that make this function.
  • save those bytes as a C array and include it in my bootloader program as static data.
  • at boot, memcpy() that array to a location in RAM and call that location. This causes flash_write_block() to execute in RAM and then return.

We need a location for the “array” in RAM to jump to. SDCC provides an __at attribute to enable us to place a variable at a set location. The static array will have an underscore in front of its name (by my convention) so I decided on:

__at(0x400) char _flash_write_block_ram[sizeof(_flash_write_block)];

This function needs to know two things:

  • the location of a block of data to write (128 bytes on my STM8 target)
  • the destination to write to

The data to write must also be in RAM so I selected a fixed location in RAM to hold that block using SDCC’s __at attribute:

__at(0x380) char data[128];

I decided to make the destination a “block number” where 0 is the first block of application firmware. My application starts after the UBC, for example at 0x8400 so block 0 is address 0x8400 and block 1 is 0x8480 (the next block). I can push this to the stack (as an argument to flash_write_block() or, since the data is in RAM anyway, we can make a RAM location to store this as well:

__at(0x37C) uint32_t block;

In the bootloader’s main(), we simply copy from Flash to RAM:

memcpy((void *)0x400, _flash_write_block_ram, sizeof(_flash_write_block));

then, whenever we need to call flash_write_block() in RAM, we simply write the data block and block number to the defined locations and call. For example,

block = 0; /* Write to the 0th block of the application */
__asm call 0x400; __endasm

…and whatever is in data will be written to 0x8400.

Implementing the RAM function

The STM8 reference manual describes several ways to program the Flash. I chose the following:

  • erase the target block
  • wait for Flash operation to finish
  • request “fast” block programming (we’ll program all 128 bytes, and the block was already erased)
  • write all 128 bytes to the target block
  • wait for Flash operation to finish

The data to write is at arbitrary RAM location 0x380 and the block number is at 0x37C. We also need a loop counter for later (and SDCC does not support variable declarations mixed with code).

#include "stm8l15x.h" 

#define BOOT_SIZE   0x400 /* This must match the UBC setting as well */
#define APP_BASE    (0x8000 + BOOT_SIZE)

void flash_write_block(void)
    unsigned i;
    uint32_t block = *(uint32_t *)0x37C;
    uint32_t addr = APP_BASE + (FLASH_BLOCK_SIZE * block);
    uint8_t *dest = (uint8_t *)(addr);
    uint8_t *data = (uint8_t *)0x380;

We now know where to write so let’s start by erasing the block. This is done by requesting an erase operation and then writing 0 to the first word in the block:

    *((uint32_t *)(addr)) = 0;

We then wait for the operation to finish:


Now we can request block programming (specifically the “fast” version):


To program, we simply copy data to dest (we can’t call memcpy() since it is not in RAM):

    for (i = 0; i < FLASH_BLOCK_SIZE; i++)
        dest[i] = data[i];

That should do it! Now wait for the operation to finish:


and we are done. SDCC will label this function _flash_write_block in the assembly output. In my hacky scheme, I need a header file such as ramfunc.h that has something like:

#pragma once

static char _flash_write_block[] = { /* the instructions */ };

to make this whole thing work.

Retrieving instructions for RAM

Normally we could identify where in a binary the target function is implemented and indeed SDCC lets us know through the linker map file. That said, the linker does not make a binary (it makes an Intel hex by default) and I don’t have much information beyond that to help me. After fighting with the SDCC toolchain a while I decided to “scrape” the assembly file for the dummy program implementing the flash_write_block() function to retrieve the machine instructions needed.

This isn’t my proudest moment but I need to get things working and I didn’t see a reliable path forward from the linker’s output. The dummy program just needs:

#include "stm8l15x.h" 

#define BOOT_SIZE   0x400 /* This must match the UBC setting as well */
#define APP_BASE    (0x8000 + BOOT_SIZE)

void flash_write_block(void)
    /* the implementation, shown above */

void main(void)
    flash_write_block(); /* I want to see it called */
    while (1);

Having compiled this dummy program, SDCC will leave us with a few interesting files, the most interesting to me being the listing (.lst) file.

The assembly listing will show the function starting with:

  000000                        115 _flash_write_block:

That is, an underscore, the name, and a colon (it’s a label). The last instruction in the function should of course be a ret

  00006F 81               [ 4]  193     ret

So we just need to grab the instructions (column 2 above, for instance 0x81 is the STM8 ret instruction) and write them into an array. A more complex line may look like:

  000060 A3 00 80         [ 2]  184     cpw x, #0x0080

And our array should have 0xA3, 0x00, 0x80, corresponding to this.

There are a few ways to do this but I wound up writing a quick and dirty Python script to do it. It takes a path to a .lst file and a function to “extract” and creates an output file (C header file) with the resulting instructions.

This again isn’t my proudest moment but, hey, we’re getting very hacky here:

def load(f, fname, outfile):
    found = False

    for line in f:
        fields = line.split()
        if found:
            if len(fields) >= 2:
                if fields[1] == '81':
                    found = False
                write_out(fields, outfile, found)
            if fields[len(fields) - 1] == fname + ':':
                outfile.write('#pragma once\n\nstatic char ' + \
                        fname + '[] = { ')
                found = True


The load() method above should capture everything given a function fname from an input file f and write a C header file to outfile. It’s not doing much error checking at all and we assume that it’s handed sane input with a 0x81 instruction to finish things off.

For reference, the output file corresponding to the function I described here gives me, at this time:

#pragma once

static char _flash_write_block[] = { 0x52, 0x08, 0xAE, 0x03, 0x7C, 0x89, 0xEE, 0x02, 0x51, 0x85, 0xFE, 0x9F, 0x88, 0xA6, 0x07, 0x90, 0x58, 0x09, 0x01, 0x02, 0x49, 0x01, 0x4A, 0x26, 0xF6, 0x84, 0x72, 0xA9, 0x84, 0x00, 0xA9, 0x00, 0x02, 0xA9, 0x00, 0x95, 0x17, 0x05, 0x1F, 0x03, 0x1E, 0x05, 0x1F, 0x01, 0xAE, 0x50, 0x51, 0xF6, 0xAA, 0x20, 0xF7, 0x1E, 0x05, 0x90, 0x5F, 0xEF, 0x02, 0xFF, 0xAE, 0x50, 0x54, 0xF6, 0xA5, 0x05, 0x26, 0xF8, 0xAE, 0x50, 0x51, 0xF6, 0xAA, 0x10, 0xF7, 0x5F, 0x1F, 0x07, 0x16, 0x01, 0x72, 0xF9, 0x07, 0x1E, 0x07, 0x1C, 0x03, 0x80, 0xF6, 0x90, 0xF7, 0x1E, 0x07, 0x5C, 0x1F, 0x07, 0x1E, 0x07, 0xA3, 0x00, 0x80, 0x25, 0xE7, 0xAE, 0x50, 0x54, 0xF6, 0xA5, 0x05, 0x26, 0xF8, 0x5B, 0x08, 0x81};

…and that does the trick.

Building your own Microblaze toolchain from source

I recently needed to work with a synthesized Microblaze CPU set up as a microcontroller. After using Vivado to generate the initial SDK I decided to work through building a toolchain from source.

There are two Microblaze worlds, so to speak. The big ones configured with an MMU are able to run Linux, and for those we have pre-packaged toolchains such as gcc-microblaze-linux-gnu in Fedora. For the small ones (no MMU, meant to run bare-metal or an RTOS) one typically uses the Vivado-supplied GCC toolchain however that may be outdated (GCC 5.2 in Vivado 2016.04 for example) and may not have features that you require. Furthermore the Vivado-supplied toolchain was for some reason built for a 32-bit x86 host.


Microblaze support seems to be open-sourced and generally available, however Xilinx do not make much effort to get their changes into mainline so we need to use Xilinx’s forks of various projects in order to build something that can compile and link for Microblaze. Luckily Xilinx maintains a Registered Guest Resources site where we can obtain source snapshots for anything open-source that they utilize. There we find source archives for the entire toolchain, either the GCC 5.2 that is shipped in Vivado 2016.04 or at this time a GCC 6.2 as well. These contain:

  • GCC
  • GDB
  • binutils (Xilinx’s changes don’t appear to be in mainline)
  • newlib (the standard C library implementation for small systems)

Extract the archive and then untar the contents (each source snapshot is a tarball inside that archive).


The toolchain itself is built by utilizing the crosstool-ng toolchain builder. Xilinx maintain their own fork of crosstool-ng with some Microblaze-specific changes. I chose to use that, though the top-level Makefile needed changes from mainline crosstool-ng in order to build (I patched the Xilinx fork with changes from mainline).

As a starting point, I used the samples/microblaze-xilinx-elf configuration as the crosstool-ng .config and then ran make menuconfig. Note that experimental support (CT_EXPERIMENTAL) is enabled: this is required for Microblaze to be an option in crosstool-ng.


We need to build crosstool-ng but we don’t have to install it (it can be run from its source directory). To do that:

./configure --enable-local

The ct-ng binary will be run to use crosstool-ng.


Copy the sample configuration file to .config and then run ct-ng menuconfig to further configure the toolchain. The options you should configure include:

  • no MMU
  • set endianness to match your synthesized target (the default is big-endian)
  • build a multilib toolchain (that way you can adjust compilation based on selected Microblaze CPU options, for example the barrel shifter is optional)
  • build a sysrooted toolchain using cross-compilation
  • the target operating system is bare-metal
  • the target binary format is ELF
  • for binutils, configure with --disable-sim, the simulator will not build for microblaze and you’re unlikely to need it, so turn it off.
  • for gcc, enable LTO support but disable graphite: unfortunately the library graphite needs to do its work will not build for microblaze at this time.
  • for newlib, enable space savings

Also for each of the tools (gcc, binutils, gdb, etc) configure the source location to point to the absolute path to your extracted Xilinx source snapshots.


Save your configuration and then run ./ct-ng make, this will hopefully build the toolchain but it will take a very long time of course. In fact, crosstool-ng will build a host toolchain first and then use that to build the cross-toolchain.

If successful, you will now have a directory named tool-build one level up from your crosstool-ng directory. This contains a bin directory with the resulting toolchain. Assuming typical options, the toolchain will have a prefix of microblaze-xilinx-elf for a big-endian toolchain or microblazeel-xilinx-elf for little-endian.

Trying things out

The Xilinx SDK builds an archive named libxil.a (the xil library) which in turn contains an entry point and some peripheral initialization code. The source code for this is provided and you can rebuild that library as needed, though you can use the pre-built one as-is if you wish. You will find that the linker will require libxil either way. To test out the new toolchain, supply a path to the libs directory containing your copy of libxil.a via the -L option.

We can ask the toolchain to dump the options which which it was built in order to sanity-check our configuration:

microblaze-xilinx-elf-gcc -v

Finally, we can build a sample program:

microblaze-xilinx-elf-gcc -L/path/to/libs foo.c -o foo.elf

And we can generate a binary suitable for loading on hardware:

microblaze-xilinx-elf-objcopy -S -I elf32-big foo.elf -O binary foo.bin

(adjust the above for your toolchain prefix/endianness as needed). The ELF should be usible in the Vivado debugger (a wrapper over GDB), or GDB itself if you have set up the Xilinx debug bridge, and the binary can be loaded onto the target if you have some means of doing that beside the debugger.

Testing applications with qemu user mode, automake, and buildroot

Previously I discussed using qemu user mode to run cross-compiled binaries, now let’s put a few things together and run unit tests automatically with automake’s make check. Then we’ll integrate everything into a build system (buildroot) and automatically run cross-compiled tests as part of the build process. This has a number of advantages including:

  • no need to build for both x86 and ARM
  • no need to deploy cross-compiled unit tests to a real target or even a qemu-system-arm machine, they’ll just run on your machine
  • easy integration and automation in your build system without requiring any additional steps or configurations

Many thanks to Andrey Smirnov who explored this further after discussing qemu user mode with me and figured all of this out (I’m mostly just writing up notes on what he did).

Running tests with automake

automake has a feature called LOG_COMPILER as part of its Parallel Test Harness infrastructure. If this variable is set, tests are executed with the LOG_COMPILER as a wrapper. The Parallel Test Harness documentation gives perl as an example wrapper but we can take advantage of this feature to wrap our tests with a shell script calling out to qemu in order to run them directly on our build machine.

We can capture and set this variable from the environment in configure.ac, for instance:


It could be set like this when invoking configure:

LOG_COMPILER=qemu-wrapper.sh ./configure

Of course we’d also include our cross-compilation options as well. We can then run make and make check and we should see that the tests invoked by make check are now run using qemu-wrapper.sh. That in turn needs to be some shell script that runs its first command-line argument using qemu-arm with appropriate options (such as -L) set.

Running tests with buildroot

Now let’s put everything together into a buildroot recipe. I assume you’re familiar with how buildroot works (if not, please consult the buildroot manual and familiarize yourself with how external.mk and recipes work).

Add a test runner

We need something simple to point LOG_COMPILER to so we can add that as a shell script somewhere in your external/ tree. For example, external/test-runner.sh could look like:


exec ${HOST_DIR}/usr/bin/qemu-arm -L ${STAGING_DIR} $1

Keep in mind that HOST_DIR points to your build output’s host directory, containing the host’s toolchain and, in this case, also the host-qemu executible. Meanwhile the STAGING_DIR gives us exactly what we need for qemu’s -L option.

We can then create a variable that points to this script. For instance, add:

TEST_RUNNER := $(BR2_EXTERNAL)/test-runner.sh

to external.mk. Now package recipes have access to the test-runner.sh script via TEST_RUNNER.

Making sure qemu-arm is available

Enable qemu from the Host Utilities menu. Specifically you want to turn on:

  • host qemu
  • Enable Linux user-land emulation

You don’t need “Enable system emulation” unless you want to use qemu’s machine emulator mode.

Packages needing this qemu-based test infrastructure simply need to select BR2_PACKAGE_HOST_QEMU as a dependency in their Config.in. For example,

    bool "example-app"
      This is a our example app.

They also need to call out host-qemu as a dependency in their recipe, for example for in example-app.mk, we could have:


Now when example-app is built, buildroot will build and deploy qemu-arm (to host).

Adding test hooks to a recipe

Given a typical buildroot recipe for our example app, example-app.mk, all we need to do is set LOG_COMPILER for automake and tell buildroot to run make check as a post-build step. Setting LOG_COMPILER is easy by adding to buildroot’s <package>_CONF_ENV variable:


Next, we add a post-build hook:

define example-app-make-check
$(MAKE) -C $(@D) check

EXAMPLE_APP_POST_BUILD_HOOKS += example-app-make-check

Now we should see two things:

  • buildroot will add our LOG_COMPILER variable to the environment when running configure in this package
  • having built the package, buildroot will execute make check in the package build directory and that in turn will run any tests the package provides with qemu-arm

To try this out, build the package:

make example-app

You should hopefully see the familiar make check output after the application is built. Running file on binaries in the build directory should reveal that they are in fact ARM binaries!

A failed unit test in the above example will break the build for that package. You can adjust the implementation slightly if that’s not what you want and you could even introduce a configuration option in your external/Config.in menu to make tests optional or control whether test failures break the build, as usual there is a lot that can be customized with buildroot.

Specifying which tests to run in a recipe

We may not want to run every test in the build system. Generally speaking, we can run most unit tests but we will need to skip shell-based functional tests. automake provides a way to do that with the TESTS environment variable. For example, if we want to run only “test1” and “test2” we can specify them like this:

define example-app-make-check
TESTS="test1 test2" $(MAKE) -C $(@D) -e check

EXAMPLE_APP_POST_BUILD_HOOKS += example-app-make-check

The -e argument tells make to override variables from the environment and we override TESTS (the variable calling out which tests to run) with our own list. Now make check will run just those tests.

qemu and kernel version

The recipe the builds qemu in buildroot, package/qemu/qemu.mk, performs a check comparing the host’s kernel version to the target’s and will only succeed if the host version is greater than or equal to the target’s. You may find that buildroot refuses to build qemu-arm for you. The reasoning is explained in qemu.mk: in user mode, qemu translates system calls, so it’s assumed that the target does not make calls that don’t exist on your host machine.

If this is a problem, you can defeat this check (HOST_QEMU_COMPARE_VERSION) at your own risk or upgrade your host machine if possible. Alternately, you can use your own qemu rather than buildroot’s (for example a distribution package or your own build). Generally speaking, it’s unlikely that a unit test is going to make system calls that don’t exist on your host machine but of course it depends on what you’re testing and anything is possible.

Using qemu user mode to run cross-compiled binaries

The qemu CPU emulator is typically used to emulate an entire system, for example we can use qemu-system-arm to start a simulated ARM-based machine, boot Linux, and then run whatever software is appropriate. There is also a user mode which lets us run a cross-compiled executable right on our host machine and that can be used for various isolated testing tasks (for instance, running unit tests in the target system architecture).

A useful program does need some libraries however and it may need an environment. qemu user mode takes a number of arguments that can be used to supply these. The -L option lets us specify the path for the ELF interpreter prefix and we can utilize this to point the emulator to a reasonable set of libraries (for example a target file system sitting on my x86 laptop). Additional environment variables can be set with -E.

Running a cross-compiled program

I have built a BSP using buildroot and now have a target/ directory that contains nearly everything that will wind up on my root file system image (but note a few caveats) for this target. I can take advantage of the fact that target/ has everything that is needed to run an executable (that is, there’s a /lib directory there with libraries in the target architecture) so now I can use qemu to run an executable.

For instance,

qemu-arm -L target/ /path/to/some/executable

Or to run the cross-compiled ARM /bin/ls right from the target directory,

qemu-arm -L target/ target/bin/ls

should work provided the ELF interpreter prefix matches up. In my case ls is a symlink to busybox and running file on that produces:

file target/bin/busybox 
target/bin/busybox: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.3, for GNU/Linux 4.7.0, not stripped

and target/ has lib/ld-linux.so.3 present so passing -L target gets me a running executable.

In addition, $? will report the cross-compiled program’s return code correctly so with the help of the target ELF interpreter and libraries I can now run unit tests after building my BSP, those unit tests can run in their native architecture, and I can capture output from them and utilize their return codes (for example via make check in autotools) like I would expect with native unit tests. Furthermore, the unit test executables themselves never need to leave my build system (that is, they don’t need to wind up on the target root file system or even an emulated device) and I do not need a special build just for running those tests.

Unit Testing

There’s arguably value in running unit tests on their target CPU (even if that CPU is emulated), for example:

  • Memory alignment will match reality (x86 supports unaligned memory access, ARMv5 does not and we can write C code that will not work on ARMv5)
  • Math behaves the same (consider a 64-bit x86 machine performing 64-bit arithmetic vs. a 32-bit ARM machine performing the same arithmetic, we can write C programs that will give different results on different CPUs)
  • We can run tests in their expected target endianness and that can matter even though carefully written code should behave correctly.

Unit tests run with qemu user mode would ideally be real testable units since we’re not simulating the entire machine (that is, they should not interact with hardware or other software since they are running on their own) but that is in line with unit testing practices anyhow. Functional tests that need additional software components or hardware can be tested on an emulated machine with qemu system mode.

Interactive debugging with GDB

qemu user mode can act as a GDB server, enabling us to debug our cross-compiled program. Consider the usual trivial program:

#include <stdio.h>

int main(void)
    printf("Hello, world!\n");
    return 0;

We can compile this with debug symbols via our toolchain, for example:

arm-buildroot-linux-gnueabi-gcc -g -c hello.c -o hello.o
arm-buildroot-linux-gnueabi-gcc hello.o -o hello

Start qemu with a GDB server on an arbitrary port, for example 12345:

qemu-arm -L target/ -g 12345 ./hello

and it will wait for GDB to attach. We can then run our toolchain GDB (using the debug symbols in hello):

arm-buildroot-linux-gnueabi-gdb ./hello

and then, in GDB’s console, connect to the GDB server:

target remote localhost:12345

we should also tell GDB to use our target root file system as the sysroot (since that’s where all the libraries are), this enables GDB to find libc and anything else we may depend on:

set sysroot ./target

We can now set breakpoints and debug like we would any other program, for example:

b main

We’re now able to run and debug our cross-compiled program without needing to boot a real or even emulated machine, provided the program can run on its own without needing access to resources that qemu user mode won’t provide (again in those cases, use qemu-system-arm to boot an emulated machine running Linux).

comments powered by Disqus