all
Stage 09

Embedded Filesystem Types

Choose and use the right filesystem for embedded Linux — ext4, squashfs read-only rootfs, tmpfs for RAM, overlayfs for layered writes, UBIFS/JFFS2 for raw NAND flash, and image creation tools.

13 min read
50236 chars

ext4

{:.gc-basic}

Basic

ext4 (Fourth Extended Filesystem) is the workhorse block filesystem of Linux. It introduced extents (replacing indirect block pointers), delayed allocation, persistent pre-allocation, and journal checksums. For embedded systems it is the right choice when you have a managed flash device (eMMC, SD card, SATA SSD) that handles its own wear-leveling.

Key Features

Feature Details
Journaling Metadata journal (default) or data+metadata (data=journal)
Extents Contiguous block ranges replace indirect block maps — better for large files
Max file size 16 TiB
Max volume size 1 EiB
Online resize resize2fs can grow a live filesystem
Directory indexing HTree (hash-tree) for fast large directory lookups

Creating an ext4 Image for Embedded

# 128 MB rootfs image
dd if=/dev/zero of=rootfs.ext4 bs=1M count=128

# -b 4096  : block size — 4096 is default; use 1024 for many small files
# -i 4096  : bytes-per-inode ratio — lower = more inodes (more files)
# -L rootfs: volume label
# -O ^has_journal : disable journal for read-mostly partitions (saves space)
mkfs.ext4 -b 4096 -i 4096 -L rootfs rootfs.ext4

# Mount as loop device and populate
sudo mount -o loop rootfs.ext4 /mnt/target
sudo cp -a $ROOTFS/* /mnt/target/
sudo umount /mnt/target

# Check and repair
e2fsck -f rootfs.ext4

# Shrink to minimum size
resize2fs -M rootfs.ext4

tune2fs — Adjust Parameters After Creation

# Show filesystem parameters
tune2fs -l rootfs.ext4

# Reduce mount count before auto-fsck (set to 0 to disable count-based check)
tune2fs -c 0 -i 0 /dev/mmcblk0p2

# Set reserved block percentage to 0% (default is 5% — wasteful on embedded)
tune2fs -m 0 /dev/mmcblk0p2

Mount Options for Embedded Flash

# /etc/fstab for an embedded eMMC partition
/dev/mmcblk0p2  /  ext4  noatime,nodiratime,data=writeback,errors=remount-ro  0 1
Option Purpose
noatime Do not update access timestamps — reduces write amplification significantly
nodiratime Do not update directory access timestamps
data=writeback Only journal metadata, not data — highest performance but data can be stale after crash
data=ordered Default — data flushed before metadata journal commit
errors=remount-ro Remount read-only on error instead of crashing — important for embedded
commit=60 Increase journal commit interval (default 5s) to reduce writes

squashfs — Read-Only Compressed

{:.gc-basic}

Basic

SquashFS is a compressed, read-only filesystem designed for packaging. It compresses data and metadata together into a single image with random-access decompression (block-based). A read-only rootfs has significant embedded advantages:

  • No filesystem corruption — power can be cut at any time
  • Atomic OTA updates — swap out the image file or partition
  • Smaller storage — 40–60% compression ratio typical with zstd/xz
  • Faster reads — decompression from flash is often faster than uncompressed reads from slow NAND

Creating SquashFS Images

# Basic — default gzip compression
mksquashfs $ROOTFS rootfs.squashfs

# Production — zstd (best speed/ratio balance)
mksquashfs $ROOTFS rootfs.squashfs \
    -comp zstd \
    -Xcompression-level 15 \
    -b 131072 \
    -noappend \
    -no-progress

# Maximum compression — xz (best ratio, slower decompress)
mksquashfs $ROOTFS rootfs.squashfs \
    -comp xz \
    -b 262144 \
    -noappend

# Embedded optimised — lz4 (fastest decompress, lowest CPU on boot)
mksquashfs $ROOTFS rootfs.squashfs \
    -comp lz4 \
    -Xhc \
    -b 65536 \
    -noappend

Compression Comparison

Algorithm Ratio Decompress Speed Compress Speed Best For
gzip Good Moderate Moderate Default/compatibility
lzo Moderate Very fast Fast Low-power CPUs
lz4 Lower Fastest Fastest Boot speed critical
zstd Excellent Fast Fast Recommended default
xz Best Slow Very slow Smallest image size
# Mount a squashfs image
sudo mount -t squashfs -o loop rootfs.squashfs /mnt/target

# Inspect contents without mounting
unsquashfs -l rootfs.squashfs | head -30

# Extract to directory
unsquashfs -d extracted/ rootfs.squashfs

Typical Embedded Usage Pattern

NOR/NAND Flash Layout:
  Partition 0: U-Boot bootloader      (512 KB)
  Partition 1: Kernel + dtb           (8 MB)
  Partition 2: SquashFS rootfs (ro)   (32 MB)   ← atomic OTA target
  Partition 3: ext4 data (rw)         (rest)

tmpfs — RAM Filesystem

{:.gc-mid}

Intermediate

tmpfs is a virtual filesystem implemented entirely in kernel memory (and swap, if available). Unlike ramfs (which grows without bound), tmpfs has a configurable size limit and returns unused pages to the page cache.

Characteristics

  • Files live in kernel memory — lost on reboot or unmount
  • Memory is only consumed when files are actually written (sparse)
  • Supported as backing store: anonymous pages + swap
  • Extremely fast — no disk I/O at all
  • Supports extended attributes and POSIX ACLs

Mount Options

# Mount a 128 MB tmpfs on /tmp
mount -t tmpfs -o size=128m,mode=1777 tmpfs /tmp

# /run — runtime data (PID files, sockets)
mount -t tmpfs -o size=32m,mode=755 tmpfs /run

# /dev/shm — POSIX shared memory
mount -t tmpfs -o size=64m tmpfs /dev/shm
Option Description
size=N Maximum size in bytes, KiB (k), MiB (m), GiB (g), or % of RAM
mode=OCTAL Permissions of the mount point root directory
uid=N Owner UID of the mount point root
gid=N Owner GID of the mount point root
nr_inodes=N Maximum number of inodes (default: half of RAM pages)
nr_blocks=N Maximum number of blocks
# /etc/fstab entries for typical embedded tmpfs mounts
tmpfs  /tmp          tmpfs  size=64m,mode=1777,nosuid,nodev   0 0
tmpfs  /run          tmpfs  size=16m,mode=755,nosuid,nodev    0 0
tmpfs  /var/volatile tmpfs  size=32m,mode=755                 0 0
tmpfs  /dev/shm      tmpfs  size=32m,mode=1777,nosuid,nodev   0 0

Use Cases on Embedded Systems

  • /tmp — temporary files (always volatile)
  • /run — PID files, Unix sockets, runtime state
  • /var/volatile — when /var needs to be writable but is on a read-only rootfs
  • /var/log — log files on systems without persistent storage
  • Buildroot default: /var is a symlink to /var/volatile (a tmpfs)
# Check current tmpfs usage
df -h -t tmpfs

# Verify size limit is respected
mount | grep tmpfs
# tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=65536k,mode=1777)

overlayfs — Layered Filesystem

{:.gc-mid}

Intermediate

overlayfs merges two directory trees: a lower (read-only) layer and an upper (read-write) layer. Reads come from upper if the file exists there, otherwise from lower. Writes always go to upper. The merged directory is the union view presented to users.

Directory Structure

lower/     ← read-only (e.g., squashfs rootfs)
upper/     ← read-write (e.g., tmpfs or ext4 partition)
workdir/   ← overlayfs internal use (must be on same fs as upper)
merged/    ← the unified view (mounted here)

Mounting overlayfs

# Create the required directories
mkdir -p /overlay/{upper,work}
mkdir -p /overlay/merged

# Mount — lower is read-only squashfs already at /ro
mount -t overlay overlay \
    -o lowerdir=/ro,upperdir=/overlay/upper,workdir=/overlay/work \
    /overlay/merged

# Multiple lower layers (read from right to left, right = highest priority)
mount -t overlay overlay \
    -o lowerdir=/base:+/layer1:+/layer2,upperdir=/upper,workdir=/work \
    /merged

Practical Embedded OTA Pattern

Boot with squashfs (ro) + overlayfs (rw tmpfs upper):

  /lower    ← mount squashfs here (read-only, compressed)
  /upper    ← tmpfs (writable, lost on reboot)  OR  ext4 partition (persistent writes)
  /workdir  ← same filesystem as upper
  /         ← overlayfs merged view (read-write from user perspective)
# Typical rcS sequence for overlayfs rootfs
mount -t squashfs /dev/mtdblock2 /lower
mount -t tmpfs tmpfs /upper -o size=64m
mkdir -p /upper/data /upper/work
mount -t overlay overlay \
    -o lowerdir=/lower,upperdir=/upper/data,workdir=/upper/work \
    /mnt/newroot
exec switch_root /mnt/newroot /sbin/init

Docker’s Use of overlayfs

Docker uses overlayfs to layer container image layers. Each image layer is a lowerdir; the running container gets a upperdir for its writes. Deleting the container discards the upperdir, restoring the image to its original state.

overlayfs Limitations

Limitation Detail
Hardlinks Cannot create hardlinks that span lower and upper layers
fsync propagation fsync on the merged view does not guarantee lower layer durability
rename across layers Moving a file from lower to a different directory creates a copy-up + whiteout
NFS lower overlayfs over NFS lower layer is not supported in upstream kernel
Copy-up overhead First write to a lower-layer file triggers a full copy-up to upper

JFFS2 and UBIFS for Raw NAND

{:.gc-adv}

Advanced

Raw NAND flash differs fundamentally from managed flash (eMMC, SD, SSD):

  • No built-in FTL (Flash Translation Layer)
  • Must handle erase blocks (128 KB – 2 MB typical)
  • Subject to write endurance (10,000–100,000 cycles per block)
  • Prone to bit errors — requires ECC
  • Cannot overwrite — must erase entire block before writing

Linux exposes raw NAND through the MTD (Memory Technology Device) subsystem:

# View MTD partitions
cat /proc/mtd
# dev:    size   erasesize  name
# mtd0: 00080000 00020000 "u-boot"
# mtd1: 00500000 00020000 "kernel"
# mtd2: 07a80000 00020000 "rootfs"

# Character devices: /dev/mtd0 (raw), /dev/mtdblock0 (block emulation)
mtdinfo /dev/mtd2

JFFS2 (Journalling Flash File System 2)

JFFS2 was the original Linux flash filesystem. It stores files as linked lists of nodes written sequentially across the flash.

Characteristics:

  • Built-in wear leveling via log-structured writes
  • Transparent compression (zlib, rtime, LZO)
  • Power-fail safe — nodes are written atomically
  • Slow mount time on large partitions (must scan all nodes at boot)
  • Recommended for NOR flash and small NAND partitions (<64 MB)
# Create JFFS2 image (requires mtd-utils)
mkfs.jffs2 \
    -r $ROOTFS \
    -o rootfs.jffs2 \
    -e 128KiB \            # erase block size — must match hardware
    --pad=0x7a80000 \      # pad to partition size
    -n                     # do not add cleanmarkers (for NAND)

# Flash to MTD partition
flashcp -v rootfs.jffs2 /dev/mtd2

# Mount
mount -t jffs2 /dev/mtdblock2 /mnt

UBIFS (Unsorted Block Image File System)

UBIFS runs on top of UBI (Unsorted Block Images) — a volume management layer that abstracts wear leveling, bad block management, and ECC from the filesystem.

Application
    │
  UBIFS      ← filesystem with sorted B-tree index
    │
   UBI       ← volume manager, wear leveling, bad block handling
    │
   MTD       ← raw NAND access with ECC
    │
  NAND       ← physical flash

UBIFS vs JFFS2:

Feature JFFS2 UBIFS
Mount time O(n) — scans all nodes O(1) — indexed B-tree
Write performance Moderate High
Suitable flash size < 64 MB Any size
Compression Yes (zlib/lzo) Yes (lzo/zstd)
Power-fail safety Yes Yes
Wear leveling Built-in Via UBI layer
ECC Via MTD Via UBI
# Step 1: Attach MTD device to UBI
ubiattach /dev/ubi_ctrl -m 2 -d 0   # mtd2 → ubi0

# Step 2: Create UBI volume
ubimkvol /dev/ubi0 -n 0 -N rootfs -s 120MiB

# Step 3: Create UBIFS image
mkfs.ubifs \
    -r $ROOTFS \
    -o rootfs.ubifs \
    -m 2048 \         # minimum I/O size (page size)
    -e 126976 \       # logical erase block size (physical - 2*page for UBI overhead)
    -c 1000           # maximum number of logical erase blocks

# Step 4: Package into a UBI image (for flashing)
ubinize -o ubi.img -m 2048 -p 131072 ubinize.cfg

# ubinize.cfg:
# [ubifs]
# mode=ubi
# image=rootfs.ubifs
# vol_id=0
# vol_size=120MiB
# vol_type=dynamic
# vol_name=rootfs
# vol_flags=autoresize

# Flash the UBI image
nandwrite -p /dev/mtd2 ubi.img

# Mount UBIFS
mount -t ubifs ubi0:rootfs /mnt

Filesystem Image Creation

{:.gc-adv}

Advanced

Creating ext2/ext3/ext4 Without Root (genext2fs)

# genext2fs creates ext2 images without root privileges or loop mounts
genext2fs \
    -b 65536 \          # image size in 1K blocks (= 64 MB)
    -d $ROOTFS \        # source directory
    -i 4096 \           # bytes per inode
    -U \                # squash uid/gid to 0
    rootfs.ext2

# Convert to ext4
tune2fs -O extents,uninit_bg,dir_index rootfs.ext2
e2fsck -f rootfs.ext2

fakeroot — Correct Permissions Without Root

# fakeroot intercepts file ownership calls and tracks them in a database
# Allows creating images with root-owned files as a non-root user

fakeroot -- bash -c '
    cp -a $ROOTFS /tmp/staging
    chown -R root:root /tmp/staging
    chmod 4755 /tmp/staging/bin/su
    mksquashfs /tmp/staging rootfs.squashfs -comp zstd
'

Complete Image Creation Table

Format Command Read-Write Compression Flash Type Root Required
ext4 mkfs.ext4 Yes No eMMC/SD/HDD Yes (or loop)
ext2 (no root) genext2fs Yes No eMMC/SD No
SquashFS mksquashfs No Yes Any No (with fakeroot)
cpio initramfs find | cpio RAM only With gzip/xz Any No
JFFS2 mkfs.jffs2 Yes Yes NOR/NAND MTD No
UBIFS mkfs.ubifs + ubinize Yes Yes NAND (UBI) No
SquashFS+overlayfs combined Apparent yes Partial Any Partial

Buildroot Image Generation Pipeline

# Buildroot handles all of this automatically — shown here for understanding
# output/images/ contains the final artifacts:
ls output/images/
# rootfs.ext4      ← ext4 for eMMC targets
# rootfs.squashfs  ← squashfs for read-only targets
# rootfs.tar.gz    ← tarball for NFS or container base
# sdcard.img       ← full disk image with partition table

Interview Q&A

{:.gc-iq}

Interview Q&A

Q1 — Why is SquashFS preferred for a production embedded rootfs?

SquashFS provides three key properties for production: (1) power-fail safety — because it is read-only, there is no risk of filesystem corruption from unexpected power loss; (2) atomic OTA — you replace or swap the entire image atomically rather than patching individual files; (3) compression — it reduces flash consumption and can improve read throughput on slow NAND by spending CPU cycles to decompress rather than waiting for I/O. The combination of overlayfs for a writable upper layer gives back the ability to modify files at runtime.

Q2 — In overlayfs, what is the difference between lowerdir and upperdir, and what happens on a write?

lowerdir is the read-only base layer (typically a SquashFS mount). upperdir is the read-write layer (typically a tmpfs or ext4 partition). When a file in lowerdir is written for the first time, overlayfs performs a copy-up: it copies the entire file from lowerdir to upperdir, then modifies the copy. Subsequent writes go directly to upperdir. Deletion is handled with a whiteout file in upperdir that masks the lower-layer entry.

Q3 — What is the default size of a tmpfs mount if no size= option is given?

By default, tmpfs is limited to half of physical RAM. On a system with 512 MB RAM, an unconfigured tmpfs can grow to 256 MB. This is why /etc/fstab entries for /tmp, /run, and similar should always specify an explicit size= limit, especially on memory-constrained embedded systems.

Q4 — What are the three ext4 journal modes, and when would you use each?

data=journal — both data and metadata go through the journal. Safest, slowest. Rarely used. data=ordered — data is written to disk before its metadata is committed to the journal. Default mode. Good balance of safety and performance. data=writeback — metadata is journaled but data write-ordering is not guaranteed. Fastest, but data can be stale after a crash (though never corrupt). Preferred for embedded flash where write endurance matters more than ordered-write guarantees.

Q5 — When should you choose UBIFS over JFFS2?

UBIFS should be chosen for any raw NAND partition larger than ~64 MB. JFFS2’s mount time scales linearly with the number of nodes in the filesystem — on a 256 MB NAND partition it can take 30–60 seconds to mount at boot. UBIFS uses a B-tree index stored on the UBI volumes, so mount time is near-constant regardless of size. JFFS2 remains appropriate for small NOR flash chips (typically < 16 MB) where the UBI overhead is not worth the setup complexity.

Q6 — Why should embedded systems use noatime in their mount options?

Every file read updates the atime (access time) field in the inode, which requires a write to flash. On a system that reads files frequently but does not need access time tracking (virtually all embedded applications), this causes pointless write amplification — shortening flash lifespan and reducing performance. noatime disables these writes entirely. relatime (the Linux default since 2.6.30) is a compromise that only updates atime if it is older than mtime or once per day, but noatime is still preferred for flash-heavy embedded targets.

Q7 — How do you create a disk image with a partition table for an eMMC target?

# Create a 1 GB disk image with MBR partition table
dd if=/dev/zero of=sdcard.img bs=1M count=1024

# Partition: 64 MB boot (FAT32) + rest as rootfs (ext4)
parted -s sdcard.img \
    mklabel msdos \
    mkpart primary fat32 4MiB 68MiB \
    mkpart primary ext4 68MiB 100% \
    set 1 boot on

# Format partitions via loop device
sudo losetup -fP sdcard.img
LOOP=$(losetup -j sdcard.img | cut -d: -f1)
sudo mkfs.vfat -F32 -n BOOT ${LOOP}p1
sudo mkfs.ext4 -L rootfs    ${LOOP}p2
# Copy content, then:
sudo losetup -d $LOOP

References

{:.gc-ref}

References

Resource Link
Linux kernel — SquashFS documentation kernel.org/doc/html/latest/filesystems/squashfs.html
Linux kernel — overlayfs documentation kernel.org/doc/html/latest/filesystems/overlayfs.html
Linux kernel — UBIFS documentation kernel.org/doc/html/latest/filesystems/ubifs.html
MTD (Memory Technology Devices) linux-mtd.infradead.org
mtd-utils project git.infradead.org/mtd-utils.git
ext4 wiki ext4.wiki.kernel.org
Buildroot filesystem generation buildroot.org/downloads/manual/manual.html#_filesystem_images
man 8 mkfs.ext4 ext4 filesystem creation
man 8 mksquashfs SquashFS image creation
man 8 mkfs.ubifs UBIFS image creation
Bootlin embedded Linux slides bootlin.com/doc/training/embedded-linux/