Tiva C Architecture — Tiva C
- Eslam El Hefny
- Tutorials, Tiva c
- April 3, 2025
Overview
The TM4C123GH6PM is built on the ARM Cortex-M4F processor — a 32-bit RISC core with a hardware Floating-Point Unit (FPU), Harvard architecture, and a tightly integrated NVIC. Understanding the CPU pipeline, register file, and memory map is the foundation for writing fast, correct embedded code.
Beginner Level — What & Why
What is a Microcontroller Architecture?
A microcontroller architecture defines how the CPU, memory, and peripherals are connected and how they communicate. Think of it as the city plan of the chip — roads (buses), buildings (memory blocks), and offices (peripherals) all have defined addresses.
Real-World Analogy
The ARM Cortex-M4 is like a city hall with 16 employee desks (registers). Some desks have special jobs: one always knows the current task (PC), one holds the return address after a phone call (LR), one manages the stack of papers being worked on (SP). The city’s postal system (memory bus) uses fixed address ranges for different districts (Flash, SRAM, peripherals).
What Problem Does Architecture Knowledge Solve?
- Knowing the memory map lets you calculate addresses for bare-metal register access.
- Understanding the pipeline helps you predict interrupt latency.
- Knowing the register file helps you read disassembly and write efficient inline assembly.
Key Terms
| Term | Meaning |
|---|---|
| Harvard architecture | Separate instruction and data buses |
| Pipeline | Fetch, Decode, Execute stages running in parallel |
| NVIC | Nested Vector Interrupt Controller |
| FPU | Hardware Floating-Point Unit (single-precision) |
| MPU | Memory Protection Unit |
| SysTick | 24-bit system timer |
| Thumb-2 | 16/32-bit mixed instruction set used by Cortex-M4 |
Intermediate Level — How It Works
ARM Cortex-M4F Pipeline
The Cortex-M4 uses a 3-stage pipeline: Fetch → Decode → Execute. Branch prediction is not present, so a taken branch adds a 1-cycle penalty. The FPU adds an additional execute stage for floating-point operations.
Key pipeline facts:
- Single-cycle multiply (32-bit)
- 1–12 cycle divide (hardware)
- FPU: single-precision add/multiply in 1 cycle, division in 14 cycles
- Interrupt entry: 12 cycles (with FPU state saving: up to 26 cycles)
CPU Register File
| Register | Name | Purpose |
|---|---|---|
| R0–R3 | Argument/Result | Function arguments and return values |
| R4–R11 | Saved | Must be preserved across function calls (callee-saved) |
| R12 | IP | Intra-procedure scratch register |
| R13 (SP) | Stack Pointer | Points to top of current stack (PSP or MSP) |
| R14 (LR) | Link Register | Stores return address on function/exception entry |
| R15 (PC) | Program Counter | Address of next instruction to fetch |
| xPSR | Program Status | N, Z, C, V flags + ISR number + Thumb bit |
The Cortex-M4F has two stack pointers:
- MSP (Main Stack Pointer): used in Handler mode and Thread mode before RTOS
- PSP (Process Stack Pointer): used by RTOS tasks in Thread mode
Memory Map
The Cortex-M4 implements a fixed 4 GB address space divided into defined regions:
| Region | Start Address | End Address | Size | Contents |
|---|---|---|---|---|
| Code | 0x00000000 |
0x1FFFFFFF |
512 MB | Flash (256 KB on TM4C123) |
| SRAM | 0x20000000 |
0x3FFFFFFF |
512 MB | SRAM (32 KB on TM4C123) |
| Peripheral | 0x40000000 |
0x5FFFFFFF |
512 MB | All on-chip peripherals |
| External RAM | 0x60000000 |
0x9FFFFFFF |
1 GB | (not used on TM4C123) |
| External Device | 0xA0000000 |
0xDFFFFFFF |
1 GB | (not used on TM4C123) |
| PPB (Private) | 0xE0000000 |
0xFFFFFFFF |
512 MB | NVIC, SysTick, SCB, MPU, FPU |
TM4C123-specific addresses:
Flash start : 0x00000000 (256 KB)
SRAM start : 0x20000000 (32 KB)
GPIO Port A : 0x40004000
GPIO Port B : 0x40005000
GPIO Port C : 0x40006000
GPIO Port D : 0x40007000
GPIO Port E : 0x40024000
GPIO Port F : 0x40025000
UART0 : 0x4000C000
SysTick : 0xE000E010
NVIC : 0xE000E100
SCB : 0xE000ED00
MPU : 0xE000ED90
FPU : 0xE000EF30
Harvard vs Von Neumann
The Cortex-M4 uses a modified Harvard architecture:
- Instruction bus (I-Code): reads from Flash (Code region)
- Data bus (D-Code): reads/writes data from Flash
- System bus (AHB): accesses SRAM and peripherals
This allows simultaneous instruction fetch and data access, increasing throughput. In practice, TM4C123 peripherals are on the APB (Advanced Peripheral Bus) bridged from AHB — APB accesses have 2-cycle latency vs AHB’s 1-cycle.
NVIC (Nested Vector Interrupt Controller)
- Supports up to 240 external interrupts on Cortex-M4
- TM4C123 implements 138 interrupt sources
- 8 priority levels (3-bit field, bits [7:5] of priority byte; bits [4:0] ignored)
- Priority 0 = highest, Priority 7 = lowest
- Priority grouping: split preemption bits and sub-priority bits
- Vector table starts at
0x00000000(Flash); can be relocated via VTOR register
SysTick
A 24-bit down-counting timer in the Private Peripheral Bus:
- Counts from RELOAD value down to 0, then reloads
- Can generate an interrupt at zero
- Used for RTOS time-slicing and simple delays
Advanced Level — Deep Dive
xPSR Register Fields
Bit 31 (N): Negative flag
Bit 30 (Z): Zero flag
Bit 29 (C): Carry flag
Bit 28 (V): Overflow flag
Bit 27 (Q): Saturation flag (DSP)
Bits 26:25 (IT[1:0]): Thumb IT state
Bit 24 (T): Thumb bit — always 1 on Cortex-M4
Bits 19:16 (GE[3:0]): SIMD Greater-than-or-Equal flags
Bits 15:10 (IT[7:2]): Thumb IT state continuation
Bits 8:0 (ISR): Active interrupt number (0 = Thread mode)
FPU Activation
The FPU is disabled at reset. You must enable it before using float or double:
/* Enable FPU: set CP10 and CP11 to Full Access in CPACR */
/* CPACR is at SCB base + 0x088 = 0xE000ED88 */
HWREG(0xE000ED88) |= (0xF << 20); // bits [23:20] = 1111b
__DSB(); // Data Synchronization Barrier
__ISB(); // Instruction Synchronization Barrier
TivaWare’s startup code in startup_ccs.c does this automatically when you enable FPU support in CCS project settings (Properties → Build → ARM Compiler → Advanced Options → Language Options → Enable FPU support).
Bit-Banding
The Cortex-M4 supports bit-banding in two 1 MB alias regions that map each bit to a full 32-bit word:
SRAM bit-band region : 0x20000000 – 0x200FFFFF
SRAM bit-band alias : 0x22000000 – 0x23FFFFFF
Peripheral bit-band : 0x40000000 – 0x400FFFFF
Peripheral bit-band alias: 0x42000000 – 0x43FFFFFF
Formula:
#define BITBAND_PERIPH(addr, bit) \
(0x42000000 + (((addr) - 0x40000000) * 32) + ((bit) * 4))
/* Atomically set GPIO_PORTF bit 1 (PF1 = Red LED) */
#define PF1_BITBAND BITBAND_PERIPH(GPIO_PORTF_BASE + GPIO_O_DATA + \
(GPIO_PIN_1 << 2), 0)
HWREG(PF1_BITBAND) = 1; // Thread-safe single-bit write
Stack Layout at Exception Entry
When an interrupt fires, the Cortex-M4 hardware automatically pushes 8 registers:
Stack grows downward:
SP+28 xPSR
SP+24 PC (return address)
SP+20 LR
SP+16 R12
SP+12 R3
SP+8 R2
SP+4 R1
SP+0 R0 ← new SP after push
With FPU context saving enabled, an additional 18 words (S0–S15, FPSCR, reserved) are pushed, making the full frame 26 words.
Step-by-Step Example
This example reads CPU identification registers and prints them via SWO (or a simple LED blink pattern).
/*
* arch_info.c
* Reads Cortex-M4 CPUID and prints via UART0
* Board : TM4C123GXL EK LaunchPad
* SDK : TivaWare_C_Series-2.2.x
*/
#include <stdint.h>
#include <stdbool.h>
#include "inc/hw_memmap.h"
#include "inc/hw_types.h"
#include "driverlib/sysctl.h"
#include "driverlib/gpio.h"
#include "driverlib/uart.h"
#include "driverlib/pin_map.h"
#include "utils/uartstdio.h"
/* SCB CPUID register — contains implementer, variant, arch, partno, revision */
#define SCB_CPUID 0xE000ED00
/* SCB VTOR — Vector Table Offset Register */
#define SCB_VTOR 0xE000ED08
static void UART0_Init(void);
int main(void)
{
uint32_t ui32CPUID;
uint32_t ui32VTOR;
uint32_t ui32SysClk;
/* Configure system clock to 80 MHz */
SysCtlClockSet(SYSCTL_SYSDIV_2_5 | SYSCTL_USE_PLL |
SYSCTL_OSC_MAIN | SYSCTL_XTAL_16MHZ);
ui32SysClk = SysCtlClockGet(); // Should return 80000000
UART0_Init(); // Initialise UART0 for UARTprintf
/* Read CPUID register */
ui32CPUID = HWREG(SCB_CPUID);
ui32VTOR = HWREG(SCB_VTOR);
UARTprintf("=== TM4C123 Architecture Info ===\r\n");
UARTprintf("System Clock : %d Hz\r\n", ui32SysClk);
UARTprintf("CPUID : 0x%08X\r\n", ui32CPUID);
/* Implementer [31:24]=0x41=ARM, PartNo [15:4]=0xC24=Cortex-M4 */
UARTprintf("Implementer : 0x%02X (0x41 = ARM Ltd)\r\n",
(ui32CPUID >> 24) & 0xFF);
UARTprintf("Part Number : 0x%03X (0xC24 = Cortex-M4)\r\n",
(ui32CPUID >> 4) & 0xFFF);
UARTprintf("Vector Table : 0x%08X\r\n", ui32VTOR);
/* Enable FPU manually for demonstration */
HWREG(0xE000ED88) |= (0xF << 20);
UARTprintf("FPU CPACR : 0x%08X (CP10/CP11 enabled)\r\n",
HWREG(0xE000ED88));
while (1)
{
/* Idle loop */
}
}
static void UART0_Init(void)
{
/* Enable UART0 and GPIO Port A clocks */
SysCtlPeripheralEnable(SYSCTL_PERIPH_UART0);
SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOA);
while (!SysCtlPeripheralReady(SYSCTL_PERIPH_UART0));
while (!SysCtlPeripheralReady(SYSCTL_PERIPH_GPIOA));
/* Configure PA0 = UART0 RX, PA1 = UART0 TX */
GPIOPinConfigure(GPIO_PA0_U0RX);
GPIOPinConfigure(GPIO_PA1_U0TX);
GPIOPinTypeUART(GPIO_PORTA_BASE, GPIO_PIN_0 | GPIO_PIN_1);
/* Use UARTStdio at 115200, 8N1 */
UARTStdioConfig(0, 115200, SysCtlClockGet());
}
Summary
| Key Point | Details |
|---|---|
| CPU core | ARM Cortex-M4F (with FPU) |
| Clock speed | 80 MHz (16 MHz XTAL + PLL × 5) |
| Flash | 256 KB at 0x00000000 |
| SRAM | 32 KB at 0x20000000 |
| Peripheral base | 0x40000000 |
| PPB (NVIC, SCB) | 0xE0000000 |
| Pipeline stages | 3 (Fetch, Decode, Execute) |
| Interrupt latency | 12 cycles (26 with FPU) |
| Stack pointers | MSP (Handler/privileged) and PSP (Task) |
| Instruction set | Thumb-2 (16/32-bit mixed) |