all
Stage 02

GCC Flags & Optimization

Master GCC warning flags, optimization levels, sanitizers, link-time optimization, cross-compilation flags, and how to read assembly output to understand what the compiler produces.

5 min read
20753 chars

Essential Warning Flags

{:.gc-basic}

Basic

GCC warnings catch real bugs at compile time — always enable them in development.

# Minimum recommended set
gcc -Wall -Wextra -Wpedantic -Werror -o program program.c
Flag What it catches
-Wall Most common warnings (misleadingly not “all”)
-Wextra Extra warnings not in -Wall
-Wpedantic ISO C conformance violations
-Werror Treat warnings as errors (enforces zero-warning policy)
-Wshadow Variable shadows an outer scope variable
-Wformat=2 String format vulnerabilities (printf etc.)
-Wconversion Implicit type conversions that may lose data
-Wno-unused-parameter Suppress warnings for intentionally unused params
-Wundef Undefined macro used in #if
# Production-grade warning set
CFLAGS = -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 \
         -Wconversion -Wdouble-promotion -Wnull-dereference \
         -Wmisleading-indentation -Wstrict-prototypes

Optimization Levels

{:.gc-basic}

gcc -O0 -o program program.c    # No optimization (default) — fastest compile, easiest to debug
gcc -O1 -o program program.c    # Basic optimization
gcc -O2 -o program program.c    # Recommended for production
gcc -O3 -o program program.c    # Aggressive — may change behaviour (strict-aliasing, vectorization)
gcc -Os -o program program.c    # Optimize for size (embedded systems)
gcc -Oz -o program program.c    # Minimize size even more (Clang; GCC uses -Os)
gcc -Og -o program program.c    # Optimize for debugging (GCC 4.8+)
Level Use case
-O0 Development, debugging
-O2 Production builds
-O3 Performance-critical code (benchmark first!)
-Os Embedded systems — flash is limited
-Og Debug builds that still have some optimization

Debugging Flags

{:.gc-basic}

gcc -g   -o program program.c    # DWARF debug info (compatible with GDB, Valgrind)
gcc -g3  -o program program.c    # Include macro definitions
gcc -ggdb -o program program.c   # GDB-specific extensions

Never strip debug info from binaries you keep for crash analysis. Build with -g + -O2 for production; store the unstripped binary separately for debugging.


Sanitizers

{:.gc-mid}

Intermediate

Compile-time instrumentation that detects bugs at runtime with ~2x overhead:

# AddressSanitizer — heap/stack overflows, use-after-free, double-free
gcc -fsanitize=address -g -o program program.c

# UndefinedBehaviorSanitizer — signed overflow, null deref, invalid shifts
gcc -fsanitize=undefined -g -o program program.c

# Both together (recommended during development)
gcc -fsanitize=address,undefined -g -o program program.c

# ThreadSanitizer — data races (can't combine with ASan)
gcc -fsanitize=thread -g -o program program.c

Example ASan output:

==42==ERROR: AddressSanitizer: stack-buffer-overflow
WRITE of size 4 at 0x7ffd1234 thread T0
    #0 0x401150 in fill_array program.c:8
    #1 0x4011a0 in main program.c:14
Shadow bytes around the buggy address:
  0x7ffd1230: 00 00 00 00 f2 f2 f2 f2

Cross-Compilation Flags

{:.gc-mid}

# Target ARMv7 with hardware floating point (Raspberry Pi 2/3)
arm-linux-gnueabihf-gcc \
    -march=armv7-a \
    -mfpu=neon-vfpv4 \
    -mfloat-abi=hard \
    -O2 -o sensor sensor.c

# Target ARMv8 / AArch64
aarch64-linux-gnu-gcc \
    -march=armv8-a \
    -O2 -o sensor sensor.c

# Target bare-metal Cortex-M4 (no OS, no stdlib)
arm-none-eabi-gcc \
    -mcpu=cortex-m4 \
    -mthumb \
    -mfpu=fpv4-sp-d16 \
    -mfloat-abi=hard \
    -ffreestanding -nostdlib \
    -Os -o firmware.elf firmware.c

Useful Analysis Flags

# Show preprocessed output
gcc -E program.c -o program.i

# Show assembly output
gcc -S -O2 program.c -o program.s

# Show object file symbols
nm program.o
readelf -s program.o

# Show all compiler passes
gcc -v program.c -o program

# Report include paths
gcc -v -E /dev/null

Advanced: Reading Assembly Output

{:.gc-adv}

Advanced

Understanding assembly output helps you verify optimizations and diagnose performance issues.

# Generate annotated assembly with source interleaved
gcc -O2 -g -S -fverbose-asm program.c -o program.s

# Or use objdump on the compiled binary
gcc -O2 -g -o program program.c
objdump -d -S program | less

Example — comparing optimization levels:

int sum_array(int *arr, int n) {
    int s = 0;
    for (int i = 0; i < n; i++)
        s += arr[i];
    return s;
}
# -O0: naive loop
# -O2: loop unrolling, no pipeline stalls
# -O3 with AVX2: SIMD vectorization using ymm registers (processes 8 ints at once)
gcc -O3 -march=native -S sum.c

LTO lets the compiler optimize across translation unit boundaries:

gcc -O2 -flto -o program main.c utils.c sensor.c
# GCC sees all source files simultaneously at link time
# Can inline functions across files, eliminate dead code globally

Profile-Guided Optimization (PGO)

Let real workload data guide optimization:

# Step 1: Instrument build
gcc -fprofile-generate -O2 -o program_inst program.c

# Step 2: Run with representative data
./program_inst < workload.dat

# Step 3: Optimized build using collected profiles
gcc -fprofile-use -O2 -o program_pgo program.c

Interview Q&A

{:.gc-iq}

Interview Q&A

Q1 — Basic: What is the difference between -O2 and -O3?

-O2 enables a well-tested set of optimizations that are safe for all conforming code. -O3 enables additional aggressive optimizations — vectorization, aggressive inlining, tree-loop transformations — that may change the order of floating-point operations or assume strict aliasing, potentially breaking code that has undefined behaviour. Use -O2 by default; only switch to -O3 after profiling and testing.

Q2 — Intermediate: What is strict aliasing and why can it cause bugs?

The strict aliasing rule states that you may not access an object through a pointer of an incompatible type (except char *). GCC assumes you follow this rule at -O2+, allowing it to elide loads. Code that violates it (e.g., type-punning via *(int *)&float_val) has undefined behaviour and may produce wrong results after optimization. Safe alternatives: use union (C99 explicitly allows this) or memcpy for type punning, or pass -fno-strict-aliasing.

Q3 — Intermediate: How does AddressSanitizer work and what overhead does it add?

ASan works by inserting instrumented “shadow memory” alongside every allocation. Each 8 bytes of real memory get 1 byte of shadow tracking validity. Before every memory access, the compiler-inserted code checks the shadow byte. Heap is surrounded by poisoned “redzones” to catch overflows. Typical overhead: ~2× memory, ~2× runtime — acceptable for CI/CD but not production.

Q4 — Advanced: Explain LTO and when it provides the most benefit.

Link-Time Optimization allows GCC to inline and optimize across .c file boundaries. Normally the compiler only sees one translation unit at a time; functions in other .c files can’t be inlined. With LTO, all IR (GIMPLE) is written into .o files and the linker passes them all to the optimizer at once. Most benefit: when you have many small helper functions in utility modules that are called frequently. Cross-file inlining can eliminate function call overhead and enable further constant propagation.


References

{:.gc-ref}

References

Resource Link
GCC Optimize Options gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
GCC Warning Options gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Compiler Explorer (Godbolt) godbolt.org — see assembly output live
AddressSanitizer wiki github.com/google/sanitizers