Developing STM8 boot code with SDCC
I’m using the open source SDCC toolchain to develop an application for the STM8 microcontroller and part of that requires a custom bootloader (what ST’s manuals refer to as User Boot Code or UBC) and application firmware. Here are some notes on how to use SDCC and stm8flash to develop and flash the bootloader and application.
The UBC concept itself is mostly a convention on STM8. The hardware does not do much with it aside from treating the UBC area of flash as write-protected (the idea is that boot code is not field-upgradeable in a typical product whereas we may wish to reflash the application firmware).
Boot process and interrupts
The STM8 uses option byte 1 to determine the size of the UBC (it’s 0 by default
meaning there is no boot code). Setting this to a non-zero size reserves a
portion of flash (starting at 0x8000
) for the UBC. For example setting byte 1
to 4 reserves four 256 byte blocks or 1KB for the UBC.
It is up to the boot code to jump to the application. The STM8 CPU assumes that
the interrupt vector table is located at 0x8000
so this single table must be
shared between the UBC and the application.
Interrupt vector table
In SDCC, interrupt vectors look like:
void usart1_rx_irq(void) __interrupt(28)
{
}
Interrupt vectors must be implemented in the same translation unit (file) that
implements main()
and the __interrupt
attribute is used to specify their
IRQ number (which becomes an offset into the interrupt vector table).
The interrupt vector table is placed at the start of the program (0x8000
by
default, or whatever is set by using the --code-loc
option). The UBC will be
placed at 0x8000
along with its vector table but the application needs to be
placed after the UBC (starting with its own vector table). So for a 1KB UBC
(that is, option byte 1 is set to 4) we would build the application firmware
with --code-loc=0x8400
and we know that the interrupt vector table for the
application is at 0x8400
.
The STM8 manual shows the offsets from the start of the vector table for each interrupt handler. For the above interrupt 28, the offset is 0x78. Assuming the boot code does not need to do anything with interrupt 28, we could simply redirect to the application firmware’s implementation of that interrupt handler. That is, in the UBC we would have something like:
void usart1_rx_irq(void) __interrupt(28)
{
__asm jpf 0x8478 __endasm;
}
To connect the real interrupt 28 to the redirected handler in the application’s table. The application’s implementation of interrupt 28 would do whatever it is that is appropriate for handling that interrupt.
The UBC should redirect every single interrupt to the right location to provide equivalent functionality in the application. The table itself contains 4-byte entries:
- 0x00: the reset vector
- 0x04: trap handler
- 0x08: interrupt 0 (unused)
- 0x0C: interrupt 1 (FLASH)
- 0x10: interrupt 2 (DMA 0/1)
…and so on. As such, interrupt 28 is 4 * 28 + 8
or 120
which gives us the
offset 0x78
and, if the application starts at 0x8400
the redirected vector
is at 0x8478
.
We can calculate some offsets and addresses in code (or the preprocessor) to make life easier.
stm8flash
To flash the MCU:
- get the current option bytes content from the MCU
- modify that content to set the UBC size (and any other changes needed)
- write back the modified option bytes
- write the bootloader
- write the application
Using the STM8 discovery board (with STM8L151), we can read the option bytes from the MCU,
stm8flash -c stlink -p stm8l151?6 -s opt -r opt.bin
Then edit opt.bin
as needed. Note that byte 0 must always be set to 0xAA
to
keep the SWIM protocol usable. To write it back:
stm8flash -c stlink -p stm8l151?6 -s opt -w opt.bin
To write the bootloader, boot.ihx
to the default location (0x8000):
stm8flash -c stlink -p stm8l151?6 -w boot.ihx
And then to write the application to (for example) 0x8400:
stm8flash -c stlink -p stm8l151?6 -s 0x8400 -w fw.ihx
Self-programming the Flash
Warning: this is going to get very hacky!
One of the typical tasks of a bootloader is accepting a new application to write into the application section of the Flash. There isn’t much Flash on a typical STM8 microcontroller so we’re likely to implement a scheme that involves:
- the application is told to jump to boot code and accept a firmware update
- the boot code starts and, rather than jumping to the application, waits for new firmware
- new firmware is received and written to the Flash, block by block
- having completed the reflashing process, the boot code is told to boot the new application
That’s generally easy enough to implement but on STM8 (and many other parts) an efficient block-oriented Flash operation requires us to be executing from RAM rather than in-place in the Flash. This requires us to place the actual code that performs the erase and write operation into RAM as well as the block of data that we wish to write.
Unlocking the Flash
The STM8 program flash is unlocked with the following sequence:
FLASH->PUKR = FLASH_RASS_KEY1;
FLASH->PUKR = FLASH_RASS_KEY2;
while (!(FLASH->IAPSR & FLASH_FLAG_PUL));
The Flash can be erased and written after that.
Calling a RAM function
Unfortunately SDCC is missing linker features to help us do this. We cannot
tell the linker to place a routine in RAM and, to make matters worse, we cannot
use a linker-derived symbol in our C code to implement a simple memcpy()
of
the function in question from Flash to RAM, nor can we learn the length of the
function (all things that GCC and proprietary stm8 toolchains can do).
I decided to work around this limitation by taking my own hacky approach:
- write a dummy program that implements my RAM function,
flash_write_block
and compile it with SDCC like any other C program - use the assembly output to locate the implementation of
flash_write_block
and retrieve the sequence of bytes (machine instructions) that make this function. - save those bytes as a C array and include it in my bootloader program as static data.
- at boot,
memcpy()
that array to a location in RAM and call that location. This causesflash_write_block()
to execute in RAM and then return.
We need a location for the “array” in RAM to jump to. SDCC provides an __at
attribute to enable us to place a variable at a set location. The static array will have an underscore in front of its name (by my convention) so I decided on:
__at(0x400) char _flash_write_block_ram[sizeof(_flash_write_block)];
This function needs to know two things:
- the location of a block of data to write (128 bytes on my STM8 target)
- the destination to write to
The data to write must also be in RAM so I selected a fixed location in RAM to
hold that block using SDCC’s __at
attribute:
__at(0x380) char data[128];
I decided to make the destination a “block number” where 0 is the first block of application firmware. My application starts after the UBC, for example at 0x8400
so block 0 is address 0x8400
and block 1 is 0x8480
(the next block).
I can push this to the stack (as an argument to flash_write_block()
or, since the data is in RAM anyway, we can make a RAM location to store this as well:
__at(0x37C) uint32_t block;
In the bootloader’s main()
, we simply copy from Flash to RAM:
memcpy((void *)0x400, _flash_write_block_ram, sizeof(_flash_write_block));
then, whenever we need to call flash_write_block()
in RAM, we simply write
the data block and block number to the defined locations and call. For example,
block = 0; /* Write to the 0th block of the application */
__asm call 0x400; __endasm
…and whatever is in data
will be written to 0x8400
.
Implementing the RAM function
The STM8 reference manual describes several ways to program the Flash. I chose the following:
- erase the target block
- wait for Flash operation to finish
- request “fast” block programming (we’ll program all 128 bytes, and the block was already erased)
- write all 128 bytes to the target block
- wait for Flash operation to finish
The data to write is at arbitrary RAM location 0x380
and the block number is
at 0x37C
. We also need a loop counter for later (and SDCC does not support
variable declarations mixed with code).
#include "stm8l15x.h"
#define BOOT_SIZE 0x400 /* This must match the UBC setting as well */
#define APP_BASE (0x8000 + BOOT_SIZE)
void flash_write_block(void)
{
unsigned i;
uint32_t block = *(uint32_t *)0x37C;
uint32_t addr = APP_BASE + (FLASH_BLOCK_SIZE * block);
uint8_t *dest = (uint8_t *)(addr);
uint8_t *data = (uint8_t *)0x380;
We now know where to write so let’s start by erasing the block. This is done
by requesting an erase operation and then writing 0
to the first word in the
block:
FLASH->CR2 |= FLASH_CR2_ERASE;
*((uint32_t *)(addr)) = 0;
We then wait for the operation to finish:
while (FLASH->IAPSR & (FLASH_IAPSR_EOP | FLASH_IAPSR_WR_PG_DIS));
Now we can request block programming (specifically the “fast” version):
FLASH->CR2 |= FLASH_CR2_FPRG;
To program, we simply copy data
to dest
(we can’t call memcpy()
since it
is not in RAM):
for (i = 0; i < FLASH_BLOCK_SIZE; i++)
dest[i] = data[i];
That should do it! Now wait for the operation to finish:
while (FLASH->IAPSR & (FLASH_IAPSR_EOP | FLASH_IAPSR_WR_PG_DIS));
}
and we are done. SDCC will label this function _flash_write_block
in the
assembly output. In my hacky scheme, I need a header file such as ramfunc.h
that has something like:
#pragma once
static char _flash_write_block[] = { /* the instructions */ };
to make this whole thing work.
Retrieving instructions for RAM
Normally we could identify where in a binary the target function is implemented
and indeed SDCC lets us know through the linker map file. That said, the linker
does not make a binary (it makes an Intel hex by default) and I don’t have much
information beyond that to help me. After fighting with the SDCC toolchain a
while I decided to “scrape” the assembly file for the dummy program implementing
the flash_write_block()
function to retrieve the machine instructions needed.
This isn’t my proudest moment but I need to get things working and I didn’t see a reliable path forward from the linker’s output. The dummy program just needs:
#include "stm8l15x.h"
#define BOOT_SIZE 0x400 /* This must match the UBC setting as well */
#define APP_BASE (0x8000 + BOOT_SIZE)
void flash_write_block(void)
{
/* the implementation, shown above */
}
void main(void)
{
flash_write_block(); /* I want to see it called */
while (1);
}
Having compiled this dummy program, SDCC will leave us with a few interesting
files, the most interesting to me being the listing (.lst
) file.
The assembly listing will show the function starting with:
000000 115 _flash_write_block:
That is, an underscore, the name, and a colon (it’s a label). The last
instruction in the function should of course be a ret
00006F 81 [ 4] 193 ret
So we just need to grab the instructions (column 2 above, for instance 0x81
is the STM8 ret
instruction) and write them into an array. A more complex
line may look like:
000060 A3 00 80 [ 2] 184 cpw x, #0x0080
And our array should have 0xA3, 0x00, 0x80,
corresponding to this.
There are a few ways to do this but I wound up writing a quick and dirty Python
script to do it. It takes a path to a .lst
file and a function to “extract”
and creates an output file (C header file) with the resulting instructions.
This again isn’t my proudest moment but, hey, we’re getting very hacky here:
def load(f, fname, outfile):
found = False
for line in f:
fields = line.split()
if found:
if len(fields) >= 2:
if fields[1] == '81':
found = False
write_out(fields, outfile, found)
else:
if fields[len(fields) - 1] == fname + ':':
outfile.write('#pragma once\n\nstatic char ' + \
fname + '[] = { ')
found = True
outfile.write('};')
The load()
method above should capture everything given a function fname
from an input file f
and write a C header file to outfile
. It’s not doing
much error checking at all and we assume that it’s handed sane input with a
0x81
instruction to finish things off.
For reference, the output file corresponding to the function I described here gives me, at this time:
#pragma once
static char _flash_write_block[] = { 0x52, 0x08, 0xAE, 0x03, 0x7C, 0x89, 0xEE, 0x02, 0x51, 0x85, 0xFE, 0x9F, 0x88, 0xA6, 0x07, 0x90, 0x58, 0x09, 0x01, 0x02, 0x49, 0x01, 0x4A, 0x26, 0xF6, 0x84, 0x72, 0xA9, 0x84, 0x00, 0xA9, 0x00, 0x02, 0xA9, 0x00, 0x95, 0x17, 0x05, 0x1F, 0x03, 0x1E, 0x05, 0x1F, 0x01, 0xAE, 0x50, 0x51, 0xF6, 0xAA, 0x20, 0xF7, 0x1E, 0x05, 0x90, 0x5F, 0xEF, 0x02, 0xFF, 0xAE, 0x50, 0x54, 0xF6, 0xA5, 0x05, 0x26, 0xF8, 0xAE, 0x50, 0x51, 0xF6, 0xAA, 0x10, 0xF7, 0x5F, 0x1F, 0x07, 0x16, 0x01, 0x72, 0xF9, 0x07, 0x1E, 0x07, 0x1C, 0x03, 0x80, 0xF6, 0x90, 0xF7, 0x1E, 0x07, 0x5C, 0x1F, 0x07, 0x1E, 0x07, 0xA3, 0x00, 0x80, 0x25, 0xE7, 0xAE, 0x50, 0x54, 0xF6, 0xA5, 0x05, 0x26, 0xF8, 0x5B, 0x08, 0x81};
…and that does the trick.