all
Chapter 3 of 20

Tiva C Architecture — Tiva C

Eslam El Hefny Apr 3, 2025 7 min read
15% done

Tiva C Architecture — Tiva C

Overview

The TM4C123GH6PM is built on the ARM Cortex-M4F processor — a 32-bit RISC core with a hardware Floating-Point Unit (FPU), Harvard architecture, and a tightly integrated NVIC. Understanding the CPU pipeline, register file, and memory map is the foundation for writing fast, correct embedded code.


Beginner Level — What & Why

What is a Microcontroller Architecture?

A microcontroller architecture defines how the CPU, memory, and peripherals are connected and how they communicate. Think of it as the city plan of the chip — roads (buses), buildings (memory blocks), and offices (peripherals) all have defined addresses.

Real-World Analogy

The ARM Cortex-M4 is like a city hall with 16 employee desks (registers). Some desks have special jobs: one always knows the current task (PC), one holds the return address after a phone call (LR), one manages the stack of papers being worked on (SP). The city’s postal system (memory bus) uses fixed address ranges for different districts (Flash, SRAM, peripherals).

What Problem Does Architecture Knowledge Solve?

  • Knowing the memory map lets you calculate addresses for bare-metal register access.
  • Understanding the pipeline helps you predict interrupt latency.
  • Knowing the register file helps you read disassembly and write efficient inline assembly.

Key Terms

Term Meaning
Harvard architecture Separate instruction and data buses
Pipeline Fetch, Decode, Execute stages running in parallel
NVIC Nested Vector Interrupt Controller
FPU Hardware Floating-Point Unit (single-precision)
MPU Memory Protection Unit
SysTick 24-bit system timer
Thumb-2 16/32-bit mixed instruction set used by Cortex-M4

Intermediate Level — How It Works

ARM Cortex-M4F Pipeline

The Cortex-M4 uses a 3-stage pipeline: Fetch → Decode → Execute. Branch prediction is not present, so a taken branch adds a 1-cycle penalty. The FPU adds an additional execute stage for floating-point operations.

Key pipeline facts:

  • Single-cycle multiply (32-bit)
  • 1–12 cycle divide (hardware)
  • FPU: single-precision add/multiply in 1 cycle, division in 14 cycles
  • Interrupt entry: 12 cycles (with FPU state saving: up to 26 cycles)

CPU Register File

Register Name Purpose
R0–R3 Argument/Result Function arguments and return values
R4–R11 Saved Must be preserved across function calls (callee-saved)
R12 IP Intra-procedure scratch register
R13 (SP) Stack Pointer Points to top of current stack (PSP or MSP)
R14 (LR) Link Register Stores return address on function/exception entry
R15 (PC) Program Counter Address of next instruction to fetch
xPSR Program Status N, Z, C, V flags + ISR number + Thumb bit

The Cortex-M4F has two stack pointers:

  • MSP (Main Stack Pointer): used in Handler mode and Thread mode before RTOS
  • PSP (Process Stack Pointer): used by RTOS tasks in Thread mode

Memory Map

The Cortex-M4 implements a fixed 4 GB address space divided into defined regions:

Region Start Address End Address Size Contents
Code 0x00000000 0x1FFFFFFF 512 MB Flash (256 KB on TM4C123)
SRAM 0x20000000 0x3FFFFFFF 512 MB SRAM (32 KB on TM4C123)
Peripheral 0x40000000 0x5FFFFFFF 512 MB All on-chip peripherals
External RAM 0x60000000 0x9FFFFFFF 1 GB (not used on TM4C123)
External Device 0xA0000000 0xDFFFFFFF 1 GB (not used on TM4C123)
PPB (Private) 0xE0000000 0xFFFFFFFF 512 MB NVIC, SysTick, SCB, MPU, FPU

TM4C123-specific addresses:

Flash start  : 0x00000000  (256 KB)
SRAM start   : 0x20000000  (32 KB)
GPIO Port A  : 0x40004000
GPIO Port B  : 0x40005000
GPIO Port C  : 0x40006000
GPIO Port D  : 0x40007000
GPIO Port E  : 0x40024000
GPIO Port F  : 0x40025000
UART0        : 0x4000C000
SysTick      : 0xE000E010
NVIC         : 0xE000E100
SCB          : 0xE000ED00
MPU          : 0xE000ED90
FPU          : 0xE000EF30

Harvard vs Von Neumann

The Cortex-M4 uses a modified Harvard architecture:

  • Instruction bus (I-Code): reads from Flash (Code region)
  • Data bus (D-Code): reads/writes data from Flash
  • System bus (AHB): accesses SRAM and peripherals

This allows simultaneous instruction fetch and data access, increasing throughput. In practice, TM4C123 peripherals are on the APB (Advanced Peripheral Bus) bridged from AHB — APB accesses have 2-cycle latency vs AHB’s 1-cycle.

NVIC (Nested Vector Interrupt Controller)

  • Supports up to 240 external interrupts on Cortex-M4
  • TM4C123 implements 138 interrupt sources
  • 8 priority levels (3-bit field, bits [7:5] of priority byte; bits [4:0] ignored)
  • Priority 0 = highest, Priority 7 = lowest
  • Priority grouping: split preemption bits and sub-priority bits
  • Vector table starts at 0x00000000 (Flash); can be relocated via VTOR register

SysTick

A 24-bit down-counting timer in the Private Peripheral Bus:

  • Counts from RELOAD value down to 0, then reloads
  • Can generate an interrupt at zero
  • Used for RTOS time-slicing and simple delays

Advanced Level — Deep Dive

xPSR Register Fields

Bit 31 (N): Negative flag
Bit 30 (Z): Zero flag
Bit 29 (C): Carry flag
Bit 28 (V): Overflow flag
Bit 27 (Q): Saturation flag (DSP)
Bits 26:25 (IT[1:0]): Thumb IT state
Bit 24 (T): Thumb bit — always 1 on Cortex-M4
Bits 19:16 (GE[3:0]): SIMD Greater-than-or-Equal flags
Bits 15:10 (IT[7:2]): Thumb IT state continuation
Bits 8:0 (ISR): Active interrupt number (0 = Thread mode)

FPU Activation

The FPU is disabled at reset. You must enable it before using float or double:

/* Enable FPU: set CP10 and CP11 to Full Access in CPACR */
/* CPACR is at SCB base + 0x088 = 0xE000ED88              */
HWREG(0xE000ED88) |= (0xF << 20);  // bits [23:20] = 1111b
__DSB();  // Data Synchronization Barrier
__ISB();  // Instruction Synchronization Barrier

TivaWare’s startup code in startup_ccs.c does this automatically when you enable FPU support in CCS project settings (Properties → Build → ARM Compiler → Advanced Options → Language Options → Enable FPU support).

Bit-Banding

The Cortex-M4 supports bit-banding in two 1 MB alias regions that map each bit to a full 32-bit word:

SRAM bit-band region    : 0x20000000 – 0x200FFFFF
SRAM bit-band alias     : 0x22000000 – 0x23FFFFFF

Peripheral bit-band     : 0x40000000 – 0x400FFFFF
Peripheral bit-band alias: 0x42000000 – 0x43FFFFFF

Formula:

#define BITBAND_PERIPH(addr, bit) \
    (0x42000000 + (((addr) - 0x40000000) * 32) + ((bit) * 4))

/* Atomically set GPIO_PORTF bit 1 (PF1 = Red LED) */
#define PF1_BITBAND  BITBAND_PERIPH(GPIO_PORTF_BASE + GPIO_O_DATA + \
                     (GPIO_PIN_1 << 2), 0)
HWREG(PF1_BITBAND) = 1;  // Thread-safe single-bit write

Stack Layout at Exception Entry

When an interrupt fires, the Cortex-M4 hardware automatically pushes 8 registers:

Stack grows downward:
 SP+28  xPSR
 SP+24  PC  (return address)
 SP+20  LR
 SP+16  R12
 SP+12  R3
 SP+8   R2
 SP+4   R1
 SP+0   R0  ← new SP after push

With FPU context saving enabled, an additional 18 words (S0–S15, FPSCR, reserved) are pushed, making the full frame 26 words.


Step-by-Step Example

This example reads CPU identification registers and prints them via SWO (or a simple LED blink pattern).

/*
 * arch_info.c
 * Reads Cortex-M4 CPUID and prints via UART0
 * Board  : TM4C123GXL EK LaunchPad
 * SDK    : TivaWare_C_Series-2.2.x
 */

#include <stdint.h>
#include <stdbool.h>
#include "inc/hw_memmap.h"
#include "inc/hw_types.h"
#include "driverlib/sysctl.h"
#include "driverlib/gpio.h"
#include "driverlib/uart.h"
#include "driverlib/pin_map.h"
#include "utils/uartstdio.h"

/* SCB CPUID register — contains implementer, variant, arch, partno, revision */
#define SCB_CPUID   0xE000ED00

/* SCB VTOR — Vector Table Offset Register */
#define SCB_VTOR    0xE000ED08

static void UART0_Init(void);

int main(void)
{
    uint32_t ui32CPUID;
    uint32_t ui32VTOR;
    uint32_t ui32SysClk;

    /* Configure system clock to 80 MHz */
    SysCtlClockSet(SYSCTL_SYSDIV_2_5 | SYSCTL_USE_PLL |
                   SYSCTL_OSC_MAIN   | SYSCTL_XTAL_16MHZ);

    ui32SysClk = SysCtlClockGet();  // Should return 80000000

    UART0_Init();  // Initialise UART0 for UARTprintf

    /* Read CPUID register */
    ui32CPUID = HWREG(SCB_CPUID);
    ui32VTOR  = HWREG(SCB_VTOR);

    UARTprintf("=== TM4C123 Architecture Info ===\r\n");
    UARTprintf("System Clock  : %d Hz\r\n", ui32SysClk);
    UARTprintf("CPUID         : 0x%08X\r\n", ui32CPUID);
    /* Implementer [31:24]=0x41=ARM, PartNo [15:4]=0xC24=Cortex-M4 */
    UARTprintf("Implementer   : 0x%02X (0x41 = ARM Ltd)\r\n",
               (ui32CPUID >> 24) & 0xFF);
    UARTprintf("Part Number   : 0x%03X (0xC24 = Cortex-M4)\r\n",
               (ui32CPUID >> 4) & 0xFFF);
    UARTprintf("Vector Table  : 0x%08X\r\n", ui32VTOR);

    /* Enable FPU manually for demonstration */
    HWREG(0xE000ED88) |= (0xF << 20);
    UARTprintf("FPU CPACR     : 0x%08X (CP10/CP11 enabled)\r\n",
               HWREG(0xE000ED88));

    while (1)
    {
        /* Idle loop */
    }
}

static void UART0_Init(void)
{
    /* Enable UART0 and GPIO Port A clocks */
    SysCtlPeripheralEnable(SYSCTL_PERIPH_UART0);
    SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOA);
    while (!SysCtlPeripheralReady(SYSCTL_PERIPH_UART0));
    while (!SysCtlPeripheralReady(SYSCTL_PERIPH_GPIOA));

    /* Configure PA0 = UART0 RX, PA1 = UART0 TX */
    GPIOPinConfigure(GPIO_PA0_U0RX);
    GPIOPinConfigure(GPIO_PA1_U0TX);
    GPIOPinTypeUART(GPIO_PORTA_BASE, GPIO_PIN_0 | GPIO_PIN_1);

    /* Use UARTStdio at 115200, 8N1 */
    UARTStdioConfig(0, 115200, SysCtlClockGet());
}

Summary

Key Point Details
CPU core ARM Cortex-M4F (with FPU)
Clock speed 80 MHz (16 MHz XTAL + PLL × 5)
Flash 256 KB at 0x00000000
SRAM 32 KB at 0x20000000
Peripheral base 0x40000000
PPB (NVIC, SCB) 0xE0000000
Pipeline stages 3 (Fetch, Decode, Execute)
Interrupt latency 12 cycles (26 with FPU)
Stack pointers MSP (Handler/privileged) and PSP (Task)
Instruction set Thumb-2 (16/32-bit mixed)

Next Chapter

Introduction to GPIO

Share: