Embedded Tribal Knowledge – And What Really Bloats Your Binaries

I was working on an embedded project for the STM32F446RETX and wanted to test how bad exceptions and virtual function calls really are for embedded development. From everything I’d ever read, these features were supposed to be “bad”—especially in memory-constrained environments.

So imagine my surprise when I compiled a simple test program and the resulting binary was 91.4kB Flash and 4.2kB RAM. That felt way too big for what the program did. My instincts were close to buying into the usual dogma:

  • “Exceptions are evil.”
  • “Virtual functions are bloated.”
  • “RTTI will explode your memory.”

But something didn’t sit right. So instead of accepting it blindly, I decided to dig deeper. And what I discovered flipped the entire narrative.

The Setup

I built my program with the following toolchain and optimization flags:

-std=c++20

# CPU-specific:
-mcpu=cortex-m4
-mthumb
-mfpu=fpv4-sp-d16
-mfloat-abi=hard

# Small C library:
--specs=nano.specs
# Optimizations:
-Os
-flto
-ffunction-sections
-fdata-sections
-fomit-frame-pointer
-fno-rtti

# Linker flags:
-Wl,--gc-sections
-s -Wl,--strip-all
-Wl,-TSTM32F446RETX_FLASH.ld

And first removed all exceptions and replaced them with return error codes – however, that did not really change the binary size at all. Then it must be those virtual function calls and all those vTables that C++ generates. But no, replaceing them with pure functions and switch statements had barely an effect on binary size.

Then—out of pure chance—I removed the static keyword on a code section that looks like the following and made usart_out a global variable:

before


void app(){
    static UsartOStream usart_out(&huart2);
    // ... program code ...
}

after


UsartOStream usart_out(&huart2);
void app(){    
    // ... program code ...
}

The UsartOStream is just a custom class that implements a virtual write() method and hooks into the HAL USART driver (and does not depend on std::ostream – which is another one of those binary bloaters)

Boom. Binary size dropped from 91.4kB to 34.7kB Flash and 3.7kB RAM. That’s over 56kB difference.

❗ Embedded dogma is not embedded truth

“We test. We measure. We optimize based on facts.”

Naturally, I was curious—what the hell was static doing under the hood?

So I dumped all symbols with arm-none-eabi-nm and compared the two builds:

  • main_fsize_local_static.txt
  • main_fsize_global.txt

The difference was stunning.

The Hidden Cost of Function-local static

Function-local static variables are lazily initialized. That means GCC emits code to:

  • Ensure thread-safe one-time initialization
  • Check a guard variable (__cxa_guard_acquire) before initializing
  • Register a destructor with __cxa_atexit
  • Handle potential recursion and throw recursive_init_error
  • Support exception unwinding with full RTTI and demangling support

This pulled in nearly the entire C++ runtime:

  • __cxa_guard_acquire, __cxa_guard_release
  • __cxa_atexit, __gnu_cxx::recursive_init_error
  • __cxa_throw, __cxa_begin_catch, __cxa_rethrow, etc.
  • _Unwind_* functions
  • Demangling logic: everything starting with _d (e.g., _d_template_arg, _d_source_name, _d_type, etc.)

Top Size Contributors

Here are just a few examples of what the local static pulled in:

SymbolDescriptionApprox. Size
_d* symbolsName demangling logic (libiberty)~40–50kB
__cxa_guard_*One-time init guards~1–2kB
__gnu_Unwind_*, __cxa_*Exception runtime~10–15kB

This was not caused by exceptions being used or thrown—it was caused by the compiler having to support the possibility of static initialization failure or multithreaded race conditions.

By contrast, declaring the same object globally allows the compiler to statically initialize it at startup—no guards, no exceptions, no registration, no RTTI.

Reflection

The common wisdom in embedded circles is that exceptions, virtual functions, and RTTI are too expensive to use. But my experiment shows that those barely made a measurable difference if any and a simple static keyword in the wrong place can be a far worse offender.

Modern compilers are smart. Features that were once heavyweight are now safe and efficient. Instead of blindly accepting tribal knowledge, we should:

  • Measure, don’t assume
  • Understand what actually gets pulled in
  • Challenge outdated beliefs
  • Don’t think that we are crazy for doing the same thing and expecting different results – maybe the compilers changed?

Because if we don’t, we may exclude ourselves from using powerful language features that can actually make our embedded code cleaner, safer, and more maintainable.

So the next time someone says, “exceptions are bad for embedded,” ask them: “Have you measured that lately?”

Literature:

For everyone that wants to dig deeper into exceptions, and how they can make your embedded code smaller and faster, here is a cool video:

Leave a comment

Create a website or blog at WordPress.com

Up ↑