all
Stage 09

NFS Root & initramfs

Speed up embedded Linux development with NFS root filesystem booting, build and customize initramfs/initrd images with cpio, understand pivot_root and switch_root, and use in-memory rootfs for resource-constrained targets.

15 min read
68358 chars

NFS Root Booting

{:.gc-basic}

Basic

Booting an embedded target from an NFS (Network File System) root during development is one of the most powerful techniques in the embedded Linux workflow. Instead of reflashing the target’s storage on every code change, you edit files on your development host and the target picks them up instantly — or on next reboot.

Why NFS Root for Development

Pain Point Without NFS Root With NFS Root
Deploy new rootfs Reflash SD/NAND (30–120 seconds) Edit files on host, reboot target (< 10 seconds)
Debug crashes Pull SD card, mount on host gdb, strace, core dumps accessible live
Test library changes Flash entire image Replace .so on host, restart service on target
Disk space on target Limited to flash size Effectively unlimited (host disk)
Multiple targets Flash each separately Same NFS export serves all targets

NFS Server Setup (on the Development Host)

# Install NFS server (Debian/Ubuntu)
sudo apt-get install nfs-kernel-server

# Create the rootfs export directory
sudo mkdir -p /srv/nfs/target-rootfs

# Populate with your cross-compiled rootfs
sudo rsync -av $ROOTFS/ /srv/nfs/target-rootfs/

# Edit /etc/exports — grant the target access
# Format: directory  client(options)
sudo tee /etc/exports << 'EOF'
/srv/nfs/target-rootfs  192.168.1.100(rw,sync,no_subtree_check,no_root_squash)

# Allow any host on the subnet (useful for DHCP targets)
/srv/nfs/target-rootfs  192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
EOF

# Apply the new exports
sudo exportfs -arv
# exporting 192.168.1.0/24:/srv/nfs/target-rootfs

# Restart NFS server
sudo systemctl restart nfs-kernel-server

# Verify
showmount -e localhost
# Export list for localhost:
# /srv/nfs/target-rootfs  192.168.1.0/24

Important /etc/exports options:

Option Meaning
rw Allow read and write access
sync Write data to disk before acknowledging — safer
no_subtree_check Disables subtree checking — improves reliability
no_root_squash Allow root on target to access files as root on host. Required for NFS root
root_squash Default — maps target root to nobody. Breaks NFS rootfs

Kernel Configuration for NFS Root

# In kernel menuconfig (make menuconfig):
# File systems → Network File Systems:
CONFIG_NFS_FS=y            # NFS client support (must be built-in, not module)
CONFIG_NFS_V3=y            # NFSv3
CONFIG_NFS_V4=y            # NFSv4 (optional but recommended)
CONFIG_ROOT_NFS=y          # Support for NFS root filesystem

# Networking:
CONFIG_IP_PNP=y            # IP autoconfiguration at boot
CONFIG_IP_PNP_DHCP=y       # DHCP-based IP configuration (optional)
CONFIG_IP_PNP_BOOTP=y      # BOOTP (optional)

# Note: NFS_FS and ROOT_NFS MUST be built-in (=y), NOT modules (=m)
# The kernel cannot load modules before the rootfs is mounted!

Kernel Command Line for NFS Boot

# Static IP — most reliable for development
root=/dev/nfs \
nfsroot=192.168.1.10:/srv/nfs/target-rootfs,v3,tcp \
ip=192.168.1.100:192.168.1.10:192.168.1.1:255.255.255.0:target-board:eth0:off \
console=ttyAMA0,115200 \
rw

# ip= format: client:server:gateway:netmask:hostname:device:autoconf

# DHCP — simpler but requires DHCP server to assign same IP
root=/dev/nfs \
nfsroot=192.168.1.10:/srv/nfs/target-rootfs \
ip=dhcp \
console=ttyAMA0,115200 \
rw

U-Boot Environment for NFS Booting

# Set in U-Boot prompt:
setenv serverip  192.168.1.10
setenv ipaddr    192.168.1.100
setenv gatewayip 192.168.1.1
setenv netmask   255.255.255.0

setenv nfsroot '/srv/nfs/target-rootfs'

setenv nfsargs 'setenv bootargs root=/dev/nfs rw \
    nfsroot=${serverip}:${nfsroot},v3,tcp \
    ip=${ipaddr}:${serverip}:${gatewayip}:${netmask}:target:eth0:off \
    console=ttyAMA0,115200'

setenv bootcmd 'run nfsargs; tftp ${loadaddr} zImage; tftp ${fdtaddr} target.dtb; \
    bootz ${loadaddr} - ${fdtaddr}'

saveenv
boot

NFS Root Troubleshooting

{:.gc-mid}

Intermediate

Common Errors and Fixes

Kernel Message Likely Cause Fix
IP-Config: no response after N seconds Target cannot reach DHCP server or static IP config wrong Check network cable, switch, IP params in kernel cmdline
nfs: server not responding Host firewall blocking NFS ports Open ports 111 (portmap), 2049 (nfs), 20048 (mountd)
nfs: access denied by server root_squash in exports, or wrong client IP Add no_root_squash to /etc/exports, run exportfs -arv
VFS: Unable to mount root fs via NFS NFS not built-in to kernel (=m instead of =y) Rebuild kernel with CONFIG_NFS_FS=y CONFIG_ROOT_NFS=y
Mounts read-only Missing rw in kernel cmdline Add rw to bootargs
Stale NFS file handle Host-side rootfs was replaced while target was running Remount on target: mount -o remount /

Diagnostics on the Host

# Check NFS exports are active
showmount -e localhost

# Check portmap/rpcbind is running
rpcinfo -p localhost

# Verify mount daemon
rpcinfo -p | grep mount

# Check NFS server logs
journalctl -u nfs-kernel-server -f
tail -f /var/log/syslog | grep nfs

# Test mount from a second host machine
sudo mount -t nfs -o v3 192.168.1.10:/srv/nfs/target-rootfs /mnt/test
ls /mnt/test
sudo umount /mnt/test

Firewall Rules (iptables)

# Allow NFS traffic from the target subnet
iptables -A INPUT -s 192.168.1.0/24 -p tcp --dport 2049 -j ACCEPT
iptables -A INPUT -s 192.168.1.0/24 -p udp --dport 2049 -j ACCEPT
iptables -A INPUT -s 192.168.1.0/24 -p tcp --dport 111  -j ACCEPT
iptables -A INPUT -s 192.168.1.0/24 -p udp --dport 111  -j ACCEPT
# For NFSv4 you only need port 2049; NFSv3 also needs portmapper (111)

NFSv3 vs NFSv4

Feature NFSv3 NFSv4
Port Dynamic (portmapper) TCP 2049 only (firewall-friendly)
Stateful No Yes
POSIX ACL support Limited Yes
Locking External (lockd) Built-in
Kernel rootfs boot Well-supported Supported (kernel 3.x+)
Recommended for NFS root Yes (simpler) Yes (modern systems)

initramfs Basics

{:.gc-basic}

Basic

initramfs (Initial RAM Filesystem) is an in-memory filesystem that the Linux kernel unpacks before mounting the real rootfs. It provides an early userspace environment for:

  • Mounting the real rootfs (loading drivers, decrypting LUKS, assembling RAID)
  • Hardware initialisation that cannot be done from kernel space
  • Providing a rescue environment

initramfs vs initrd — Historical Note

initrd initramfs
Format Block device image (ext2) cpio archive
Kernel version 2.4 2.6+
Mount method Loopback block device Extracted directly into tmpfs
PID 1 /linuxrc /init
Cleanup pivot_root switch_root
Still used today No (obsolete) Yes (standard)

The term “initrd” is still commonly misused to mean initramfs — they are different things. When U-Boot passes -initrd or bootm with an initrd address, it is usually a gzip-compressed cpio archive (i.e., an initramfs).

initramfs is Embedded in (or Alongside) the Kernel

Two deployment modes:

  1. Separate file (initramfs.cpio.gz) — passed by bootloader alongside the kernel image. The kernel decompresses and unpacks it into an internal tmpfs.
  2. Built into the kernel (CONFIG_INITRAMFS_SOURCE) — the cpio archive is compiled into the kernel image itself. Results in a single zImage/bzImage that contains everything.

The /init Script — PID 1

The kernel executes /init (or the path given in rdinit= cmdline parameter) inside the initramfs. This is PID 1 until switch_root is called.

#!/bin/sh
# /init — minimal initramfs init script

# Mount essential virtual filesystems
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t devtmpfs devtmpfs /dev

# Load any necessary kernel modules
# modprobe ext4
# modprobe mmc_block

# Wait for block device to appear
for i in $(seq 1 20); do
    [ -b /dev/mmcblk0p2 ] && break
    sleep 0.1
done

# Mount the real rootfs
mkdir -p /newroot
mount -t ext4 /dev/mmcblk0p2 /newroot

# Hand off to the real init
exec switch_root /newroot /sbin/init

Building initramfs with cpio

{:.gc-mid}

Intermediate

Minimal initramfs Directory Structure

INITRAMFS=$HOME/initramfs
mkdir -p $INITRAMFS/{bin,sbin,dev,proc,sys,newroot,lib,lib64}

# Device nodes (or rely on devtmpfs)
sudo mknod -m 600 $INITRAMFS/dev/console c 5 1
sudo mknod -m 666 $INITRAMFS/dev/null    c 1 3

# Copy statically-linked busybox
cp busybox-1.36.1/busybox $INITRAMFS/bin/busybox
(cd $INITRAMFS/bin && ln -s busybox sh && ln -s busybox mount && \
 ln -s busybox modprobe && ln -s busybox sleep && ln -s busybox switch_root)
(cd $INITRAMFS/sbin && ln -s ../bin/busybox switch_root)

# Copy the /init script (shown above)
cp init.sh $INITRAMFS/init
chmod +x $INITRAMFS/init

Building the cpio Archive

cd $INITRAMFS

# newc format: the only format supported by Linux kernel
# The | gzip pipeline produces a .cpio.gz accepted by most bootloaders
find . | cpio -H newc -o | gzip -9 > ../initramfs.cpio.gz

# Alternative: lz4 for faster decompression at boot
find . | cpio -H newc -o | lz4 -l > ../initramfs.cpio.lz4

# Uncompressed (kernel can decompress internally if CONFIG_RD_GZIP=y)
find . | cpio -H newc -o > ../initramfs.cpio

# Check the archive
ls -lh ../initramfs.cpio.gz

# Inspect contents
gunzip -c ../initramfs.cpio.gz | cpio -t | head -30

Building initramfs Into the Kernel

# In kernel .config or menuconfig:
# General setup → Initramfs source file(s)
CONFIG_INITRAMFS_SOURCE="/home/dev/initramfs"
# OR point to the cpio archive:
CONFIG_INITRAMFS_SOURCE="/home/dev/initramfs.cpio.gz"

# Then rebuild the kernel — the archive is embedded in vmlinux
make -j$(nproc) zImage

Passing initramfs to QEMU

# Separate initramfs file — bootloader/QEMU passes alongside kernel
qemu-system-arm \
    -M vexpress-a9 \
    -kernel zImage \
    -initrd initramfs.cpio.gz \
    -append "console=ttyAMA0,115200 rdinit=/init" \
    -nographic

# For QEMU x86_64
qemu-system-x86_64 \
    -kernel bzImage \
    -initrd initramfs.cpio.gz \
    -append "console=ttyS0 rdinit=/init" \
    -nographic

Kernel Config for initramfs Decompression

# Enable support for compressed initramfs (at least one must be =y)
CONFIG_RD_GZIP=y     # gzip
CONFIG_RD_BZIP2=y    # bzip2
CONFIG_RD_LZMA=y     # lzma
CONFIG_RD_XZ=y       # xz
CONFIG_RD_LZO=y      # lzo
CONFIG_RD_LZ4=y      # lz4
CONFIG_RD_ZSTD=y     # zstd

pivot_root and switch_root

{:.gc-adv}

Advanced

After the /init script in the initramfs has mounted the real rootfs, it must hand control over to the real PID 1. This is a two-step problem: PID 1 cannot simply exec a new binary with a different rootfs because it would still be running with the initramfs mounted. Two mechanisms exist.

pivot_root — Swap the Root Mount

pivot_root changes the root mount of the calling process’s mount namespace. The old root is moved to a specified directory, allowing it to be unmounted later.

# Classic pivot_root sequence (used by older distributions)
# Assumes real rootfs is already mounted at /newroot

# Real rootfs must be a different mount than initramfs
mount --bind /newroot /newroot

# Move to new root and call pivot_root
cd /newroot
pivot_root . mnt          # new_root=., put_old=./mnt (oldroot goes to /mnt)

# Now we're in the new rootfs; old initramfs root is at /mnt
exec chroot . /sbin/init &
umount -l /mnt            # lazy-unmount the old initramfs

pivot_root requirements:

  • New root and put_old must be different filesystems (mount points)
  • The process must be in the new root’s directory
  • put_old must be under new root

switch_root — The Modern Approach

switch_root (provided by util-linux and BusyBox) is simpler and designed specifically for initramfs transition. It:

  1. Deletes all files in the current (initramfs) rootfs to free RAM
  2. Mounts the new rootfs at /
  3. chdirs into the new root
  4. execs the new init
# switch_root usage: switch_root NEW_ROOT INIT [ARGS...]
exec switch_root /newroot /sbin/init

Under the hood:

// simplified switch_root logic
chdir(newroot);
mount(newroot, "/", NULL, MS_MOVE, NULL);   // move mount to /
chroot(".");
execv(init, argv);                          // exec new init as PID 1

pivot_root vs switch_root

pivot_root switch_root
Handles initramfs No (initramfs cannot be pivoted — it has no physical device) Yes (explicitly designed for initramfs)
RAM cleanup Manual umount needed Automatically frees initramfs memory
Complexity Higher Lower
Used by Legacy distributions, container runtimes Modern initramfs scripts, dracut, mkinitramfs

Complete Realistic /init Script

#!/bin/sh
# /init — production-quality initramfs init script

set -e

rescue_shell() {
    echo "ERROR: $1"
    echo "Dropping to rescue shell. Type 'exit' to retry."
    exec /bin/sh
}

# Mount virtual filesystems
mount -t proc proc /proc 2>/dev/null || rescue_shell "cannot mount /proc"
mount -t sysfs sysfs /sys 2>/dev/null || rescue_shell "cannot mount /sys"
mount -t devtmpfs devtmpfs /dev 2>/dev/null || \
    ( mknod /dev/console c 5 1; mknod /dev/null c 1 3 )

# Parse kernel command line
ROOT=$(cat /proc/cmdline | tr ' ' '\n' | grep '^root=' | cut -d= -f2-)
ROOTFSTYPE=$(cat /proc/cmdline | tr ' ' '\n' | grep '^rootfstype=' | cut -d= -f2-)
INIT=$(cat /proc/cmdline | tr ' ' '\n' | grep '^init=' | cut -d= -f2-)
INIT=${INIT:-/sbin/init}

[ -z "$ROOT" ] && rescue_shell "root= not specified in kernel cmdline"

# Load required modules (example: SD card on i.MX6)
modprobe mmc_block 2>/dev/null
modprobe sdhci_esdhc_imx 2>/dev/null

# Wait for root device (up to 5 seconds)
TIMEOUT=50
while [ $TIMEOUT -gt 0 ]; do
    [ -b "$ROOT" ] && break
    sleep 0.1
    TIMEOUT=$((TIMEOUT - 1))
done
[ $TIMEOUT -eq 0 ] && rescue_shell "root device $ROOT did not appear"

# Mount real rootfs
mkdir -p /newroot
FSTYPE_OPT=""
[ -n "$ROOTFSTYPE" ] && FSTYPE_OPT="-t $ROOTFSTYPE"
mount $FSTYPE_OPT "$ROOT" /newroot || rescue_shell "cannot mount $ROOT"

# Optional: fsck before mounting rw
# e2fsck -p "$ROOT"
# mount -o remount,rw /newroot

# Hand off to real init — frees initramfs memory
echo "Switching to real rootfs on $ROOT, init=$INIT"
exec switch_root /newroot "$INIT" || rescue_shell "switch_root failed"

dracut and mkinitramfs

{:.gc-adv}

Advanced

Production Linux distributions do not write initramfs scripts by hand — they use framework tools that assemble the initramfs from module definitions.

dracut (Red Hat / Fedora / RHEL / SUSE)

dracut is a modular initramfs builder. Each module in /usr/lib/dracut/modules.d/ provides hooks and install scripts:

# Regenerate initramfs for current kernel
dracut --force

# Verbose output
dracut --verbose --force /boot/initramfs-$(uname -r).img $(uname -r)

# See what goes into the image
dracut --list-modules

# Build a stripped-down initramfs for an embedded target
dracut \
    --no-compress \
    --add "base rootfs-block" \
    --omit "plymouth resume" \
    --filesystems "ext4 squashfs" \
    --kmoddir /path/to/cross/modules \
    initramfs.img

Debugging dracut initramfs:

# Boot parameter: drop to shell before mounting rootfs
# Add to kernel cmdline:
rd.break          # break at end of initramfs init
rd.break=pre-udev # break before udev
rd.break=cmdline  # break right after cmdline processing

# Disable dracut compression for easier inspection
dracut --no-compress initramfs.img

mkinitramfs (Debian / Ubuntu)

# Generate for current kernel
sudo mkinitramfs -o /boot/initrd.img-$(uname -r)

# Verbose
sudo mkinitramfs -v -o initramfs.img $(uname -r)

# Inspect an existing initramfs
unmkinitramfs /boot/initrd.img-$(uname -r) /tmp/initramfs-extracted
ls /tmp/initramfs-extracted/

# Hooks live in:
ls /etc/initramfs-tools/hooks/     # local hooks (run at build time)
ls /etc/initramfs-tools/scripts/   # scripts (run at boot time inside initramfs)

Adding a custom hook:

# /etc/initramfs-tools/hooks/mydriver
#!/bin/sh
PREREQS=""
prereqs() { echo "$PREREQS"; }
. /usr/share/initramfs-tools/hook-functions

# Copy a binary and its dependencies into the initramfs
copy_exec /usr/bin/myutil /usr/bin
# Copy a module
manual_add_modules mydriver

Module Dependencies in initramfs

# Add modules to initramfs (Debian)
echo "mmc_block"         >> /etc/initramfs-tools/modules
echo "sdhci"             >> /etc/initramfs-tools/modules
echo "usb_storage"       >> /etc/initramfs-tools/modules

# Rebuild
sudo update-initramfs -u -k all

Interview Q&A

{:.gc-iq}

Interview Q&A

Q1 — What is the difference between initramfs and initrd?

initrd (initial RAM disk) was the original mechanism: the kernel mounted a block device image (ext2 format) in a loopback fashion as the root, ran /linuxrc, then used pivot_root to switch to the real rootfs. The initrd image remained resident in memory. initramfs (since Linux 2.6) is a cpio archive that the kernel extracts directly into an internal tmpfs at boot time. It runs /init as PID 1, and switch_root moves to the real rootfs and frees the initramfs memory. initramfs is faster, simpler, and more memory-efficient. The term “initrd” is still widely misused to mean initramfs.

Q2 — Why is NFS root valuable for embedded development but not for production?

In development, NFS root eliminates reflash cycles — edit a file on the host and the target sees it immediately. It enables large root filesystems with full debug tools (gdb, valgrind, strace) without consuming embedded flash. In production, NFS root requires a permanent network connection to the NFS server; if the network goes down, the system becomes non-functional. It also has latency inherent to network I/O, requires a server on the network at all times, and raises security concerns (NFS is not encrypted).

Q3 — What is the difference between pivot_root and switch_root?

pivot_root is a generic syscall that swaps the root mount point — the old root becomes accessible at a specified directory. It was used in the initrd era and requires the new root to be a separate mount. switch_root is a utility designed specifically for initramfs: it deletes all files in the current rootfs (recovering the RAM they used), performs MS_MOVE to replace / with the new root, then execs the new init as PID 1. switch_root cannot be used to pivot to a different directory within the same mount (as pivot_root can), but it properly handles the initramfs case where no physical device backs the current root.

Q4 — What must /init in an initramfs do before calling switch_root?

/init must: (1) mount /proc and /sys so that kernel information is accessible; (2) set up /dev (mount devtmpfs or create essential device nodes like /dev/console); (3) load any kernel modules needed to access the root device (storage drivers, filesystem drivers); (4) wait for the root block device to appear (udev settling or polling loop); (5) run fsck if the filesystem may need repair; (6) mount the real rootfs at a temporary directory (e.g., /newroot); and finally (7) call exec switch_root /newroot /sbin/init to hand off. If any step fails, it should drop to a rescue shell rather than panicking silently.

Q5 — How does the kernel find and load the initramfs?

There are two ways. First, if CONFIG_INITRAMFS_SOURCE points to a directory or cpio archive at kernel build time, the archive is compiled directly into the kernel image — the kernel always has it, no bootloader action needed. Second, the bootloader (U-Boot, GRUB, QEMU -initrd) passes the cpio archive as a separate blob in memory. The kernel receives its physical address and size from the bootloader via the ATAGs or Device Tree (chosen/initrd-start, chosen/initrd-end), decompresses it, and extracts it into the internal tmpfs before executing /init.

Q6 — What NFS security considerations apply when using NFS root?

Key concerns: (1) no_root_squash is required for NFS root, meaning the target’s root user can read/write any file on the server’s export — a compromised target can damage the host’s exported directory. Mitigate by exporting to a specific IP only, never to *. (2) NFS traffic is unencrypted by default — use on an isolated lab network or a VLAN. (3) The target can read kernel source, debug symbols, and credentials stored in the NFS export. (4) NFSv4 with Kerberos (krb5i/krb5p) provides authentication and integrity, but adds configuration complexity rarely worthwhile for a development environment.

Q7 — How do you debug an initramfs that fails to mount the root filesystem?

# 1. Boot with a kernel cmdline breakpoint (dracut)
rd.break

# 2. Boot with init=/bin/sh to get a shell directly in the initramfs
init=/bin/sh

# 3. Inspect from the rescue shell:
cat /proc/cmdline           # verify root= parameter was parsed correctly
ls /dev/mmcblk*             # did the block device appear?
cat /proc/modules           # are storage modules loaded?
dmesg | grep -i "mmc\|sdhci\|ext4\|nfs"   # driver probe messages

# 4. Manually mount to test:
mount -t ext4 /dev/mmcblk0p2 /newroot && echo "Mount OK"

# 5. Check cpio archive integrity on host:
gunzip -c initramfs.cpio.gz | cpio -t | grep init
# Should show:
# ./init
# (with no leading path issues)

References

{:.gc-ref}

References

Resource Link
Linux kernel — NFS root documentation kernel.org/doc/html/latest/admin-guide/nfs/nfsroot.html
Linux kernel — initramfs documentation kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html
man 8 switch_root switch_root manual page
man 8 pivot_root pivot_root manual page
man 5 exports NFS exports file format
man 8 exportfs NFS export management
dracut documentation dracut.wiki.kernel.org
mkinitramfs / initramfs-tools manpages.debian.org/initramfs-tools
Bootlin — Embedded Linux Filesystems bootlin.com/doc/training/embedded-linux/
NFS Howto nfs.sourceforge.net
QEMU ARM system emulation qemu.org/docs/master/system/target-arm.html