ext4
{:.gc-basic}
Basic
ext4 (Fourth Extended Filesystem) is the workhorse block filesystem of Linux. It introduced extents (replacing indirect block pointers), delayed allocation, persistent pre-allocation, and journal checksums. For embedded systems it is the right choice when you have a managed flash device (eMMC, SD card, SATA SSD) that handles its own wear-leveling.
Key Features
| Feature | Details |
|---|---|
| Journaling | Metadata journal (default) or data+metadata (data=journal) |
| Extents | Contiguous block ranges replace indirect block maps — better for large files |
| Max file size | 16 TiB |
| Max volume size | 1 EiB |
| Online resize | resize2fs can grow a live filesystem |
| Directory indexing | HTree (hash-tree) for fast large directory lookups |
Creating an ext4 Image for Embedded
# 128 MB rootfs image
dd if=/dev/zero of=rootfs.ext4 bs=1M count=128
# -b 4096 : block size — 4096 is default; use 1024 for many small files
# -i 4096 : bytes-per-inode ratio — lower = more inodes (more files)
# -L rootfs: volume label
# -O ^has_journal : disable journal for read-mostly partitions (saves space)
mkfs.ext4 -b 4096 -i 4096 -L rootfs rootfs.ext4
# Mount as loop device and populate
sudo mount -o loop rootfs.ext4 /mnt/target
sudo cp -a $ROOTFS/* /mnt/target/
sudo umount /mnt/target
# Check and repair
e2fsck -f rootfs.ext4
# Shrink to minimum size
resize2fs -M rootfs.ext4
tune2fs — Adjust Parameters After Creation
# Show filesystem parameters
tune2fs -l rootfs.ext4
# Reduce mount count before auto-fsck (set to 0 to disable count-based check)
tune2fs -c 0 -i 0 /dev/mmcblk0p2
# Set reserved block percentage to 0% (default is 5% — wasteful on embedded)
tune2fs -m 0 /dev/mmcblk0p2
Mount Options for Embedded Flash
# /etc/fstab for an embedded eMMC partition
/dev/mmcblk0p2 / ext4 noatime,nodiratime,data=writeback,errors=remount-ro 0 1
| Option | Purpose |
|---|---|
noatime |
Do not update access timestamps — reduces write amplification significantly |
nodiratime |
Do not update directory access timestamps |
data=writeback |
Only journal metadata, not data — highest performance but data can be stale after crash |
data=ordered |
Default — data flushed before metadata journal commit |
errors=remount-ro |
Remount read-only on error instead of crashing — important for embedded |
commit=60 |
Increase journal commit interval (default 5s) to reduce writes |
squashfs — Read-Only Compressed
{:.gc-basic}
Basic
SquashFS is a compressed, read-only filesystem designed for packaging. It compresses data and metadata together into a single image with random-access decompression (block-based). A read-only rootfs has significant embedded advantages:
- No filesystem corruption — power can be cut at any time
- Atomic OTA updates — swap out the image file or partition
- Smaller storage — 40–60% compression ratio typical with zstd/xz
- Faster reads — decompression from flash is often faster than uncompressed reads from slow NAND
Creating SquashFS Images
# Basic — default gzip compression
mksquashfs $ROOTFS rootfs.squashfs
# Production — zstd (best speed/ratio balance)
mksquashfs $ROOTFS rootfs.squashfs \
-comp zstd \
-Xcompression-level 15 \
-b 131072 \
-noappend \
-no-progress
# Maximum compression — xz (best ratio, slower decompress)
mksquashfs $ROOTFS rootfs.squashfs \
-comp xz \
-b 262144 \
-noappend
# Embedded optimised — lz4 (fastest decompress, lowest CPU on boot)
mksquashfs $ROOTFS rootfs.squashfs \
-comp lz4 \
-Xhc \
-b 65536 \
-noappend
Compression Comparison
| Algorithm | Ratio | Decompress Speed | Compress Speed | Best For |
|---|---|---|---|---|
gzip |
Good | Moderate | Moderate | Default/compatibility |
lzo |
Moderate | Very fast | Fast | Low-power CPUs |
lz4 |
Lower | Fastest | Fastest | Boot speed critical |
zstd |
Excellent | Fast | Fast | Recommended default |
xz |
Best | Slow | Very slow | Smallest image size |
# Mount a squashfs image
sudo mount -t squashfs -o loop rootfs.squashfs /mnt/target
# Inspect contents without mounting
unsquashfs -l rootfs.squashfs | head -30
# Extract to directory
unsquashfs -d extracted/ rootfs.squashfs
Typical Embedded Usage Pattern
NOR/NAND Flash Layout:
Partition 0: U-Boot bootloader (512 KB)
Partition 1: Kernel + dtb (8 MB)
Partition 2: SquashFS rootfs (ro) (32 MB) ← atomic OTA target
Partition 3: ext4 data (rw) (rest)
tmpfs — RAM Filesystem
{:.gc-mid}
Intermediate
tmpfs is a virtual filesystem implemented entirely in kernel memory (and swap, if available). Unlike ramfs (which grows without bound), tmpfs has a configurable size limit and returns unused pages to the page cache.
Characteristics
- Files live in kernel memory — lost on reboot or unmount
- Memory is only consumed when files are actually written (sparse)
- Supported as backing store: anonymous pages + swap
- Extremely fast — no disk I/O at all
- Supports extended attributes and POSIX ACLs
Mount Options
# Mount a 128 MB tmpfs on /tmp
mount -t tmpfs -o size=128m,mode=1777 tmpfs /tmp
# /run — runtime data (PID files, sockets)
mount -t tmpfs -o size=32m,mode=755 tmpfs /run
# /dev/shm — POSIX shared memory
mount -t tmpfs -o size=64m tmpfs /dev/shm
| Option | Description |
|---|---|
size=N |
Maximum size in bytes, KiB (k), MiB (m), GiB (g), or % of RAM |
mode=OCTAL |
Permissions of the mount point root directory |
uid=N |
Owner UID of the mount point root |
gid=N |
Owner GID of the mount point root |
nr_inodes=N |
Maximum number of inodes (default: half of RAM pages) |
nr_blocks=N |
Maximum number of blocks |
# /etc/fstab entries for typical embedded tmpfs mounts
tmpfs /tmp tmpfs size=64m,mode=1777,nosuid,nodev 0 0
tmpfs /run tmpfs size=16m,mode=755,nosuid,nodev 0 0
tmpfs /var/volatile tmpfs size=32m,mode=755 0 0
tmpfs /dev/shm tmpfs size=32m,mode=1777,nosuid,nodev 0 0
Use Cases on Embedded Systems
/tmp— temporary files (always volatile)/run— PID files, Unix sockets, runtime state/var/volatile— when/varneeds to be writable but is on a read-only rootfs/var/log— log files on systems without persistent storage- Buildroot default:
/varis a symlink to/var/volatile(a tmpfs)
# Check current tmpfs usage
df -h -t tmpfs
# Verify size limit is respected
mount | grep tmpfs
# tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=65536k,mode=1777)
overlayfs — Layered Filesystem
{:.gc-mid}
Intermediate
overlayfs merges two directory trees: a lower (read-only) layer and an upper (read-write) layer. Reads come from upper if the file exists there, otherwise from lower. Writes always go to upper. The merged directory is the union view presented to users.
Directory Structure
lower/ ← read-only (e.g., squashfs rootfs)
upper/ ← read-write (e.g., tmpfs or ext4 partition)
workdir/ ← overlayfs internal use (must be on same fs as upper)
merged/ ← the unified view (mounted here)
Mounting overlayfs
# Create the required directories
mkdir -p /overlay/{upper,work}
mkdir -p /overlay/merged
# Mount — lower is read-only squashfs already at /ro
mount -t overlay overlay \
-o lowerdir=/ro,upperdir=/overlay/upper,workdir=/overlay/work \
/overlay/merged
# Multiple lower layers (read from right to left, right = highest priority)
mount -t overlay overlay \
-o lowerdir=/base:+/layer1:+/layer2,upperdir=/upper,workdir=/work \
/merged
Practical Embedded OTA Pattern
Boot with squashfs (ro) + overlayfs (rw tmpfs upper):
/lower ← mount squashfs here (read-only, compressed)
/upper ← tmpfs (writable, lost on reboot) OR ext4 partition (persistent writes)
/workdir ← same filesystem as upper
/ ← overlayfs merged view (read-write from user perspective)
# Typical rcS sequence for overlayfs rootfs
mount -t squashfs /dev/mtdblock2 /lower
mount -t tmpfs tmpfs /upper -o size=64m
mkdir -p /upper/data /upper/work
mount -t overlay overlay \
-o lowerdir=/lower,upperdir=/upper/data,workdir=/upper/work \
/mnt/newroot
exec switch_root /mnt/newroot /sbin/init
Docker’s Use of overlayfs
Docker uses overlayfs to layer container image layers. Each image layer is a lowerdir; the running container gets a upperdir for its writes. Deleting the container discards the upperdir, restoring the image to its original state.
overlayfs Limitations
| Limitation | Detail |
|---|---|
| Hardlinks | Cannot create hardlinks that span lower and upper layers |
fsync propagation |
fsync on the merged view does not guarantee lower layer durability |
rename across layers |
Moving a file from lower to a different directory creates a copy-up + whiteout |
| NFS lower | overlayfs over NFS lower layer is not supported in upstream kernel |
| Copy-up overhead | First write to a lower-layer file triggers a full copy-up to upper |
JFFS2 and UBIFS for Raw NAND
{:.gc-adv}
Advanced
Raw NAND flash differs fundamentally from managed flash (eMMC, SD, SSD):
- No built-in FTL (Flash Translation Layer)
- Must handle erase blocks (128 KB – 2 MB typical)
- Subject to write endurance (10,000–100,000 cycles per block)
- Prone to bit errors — requires ECC
- Cannot overwrite — must erase entire block before writing
Linux exposes raw NAND through the MTD (Memory Technology Device) subsystem:
# View MTD partitions
cat /proc/mtd
# dev: size erasesize name
# mtd0: 00080000 00020000 "u-boot"
# mtd1: 00500000 00020000 "kernel"
# mtd2: 07a80000 00020000 "rootfs"
# Character devices: /dev/mtd0 (raw), /dev/mtdblock0 (block emulation)
mtdinfo /dev/mtd2
JFFS2 (Journalling Flash File System 2)
JFFS2 was the original Linux flash filesystem. It stores files as linked lists of nodes written sequentially across the flash.
Characteristics:
- Built-in wear leveling via log-structured writes
- Transparent compression (zlib, rtime, LZO)
- Power-fail safe — nodes are written atomically
- Slow mount time on large partitions (must scan all nodes at boot)
- Recommended for NOR flash and small NAND partitions (<64 MB)
# Create JFFS2 image (requires mtd-utils)
mkfs.jffs2 \
-r $ROOTFS \
-o rootfs.jffs2 \
-e 128KiB \ # erase block size — must match hardware
--pad=0x7a80000 \ # pad to partition size
-n # do not add cleanmarkers (for NAND)
# Flash to MTD partition
flashcp -v rootfs.jffs2 /dev/mtd2
# Mount
mount -t jffs2 /dev/mtdblock2 /mnt
UBIFS (Unsorted Block Image File System)
UBIFS runs on top of UBI (Unsorted Block Images) — a volume management layer that abstracts wear leveling, bad block management, and ECC from the filesystem.
Application
│
UBIFS ← filesystem with sorted B-tree index
│
UBI ← volume manager, wear leveling, bad block handling
│
MTD ← raw NAND access with ECC
│
NAND ← physical flash
UBIFS vs JFFS2:
| Feature | JFFS2 | UBIFS |
|---|---|---|
| Mount time | O(n) — scans all nodes | O(1) — indexed B-tree |
| Write performance | Moderate | High |
| Suitable flash size | < 64 MB | Any size |
| Compression | Yes (zlib/lzo) | Yes (lzo/zstd) |
| Power-fail safety | Yes | Yes |
| Wear leveling | Built-in | Via UBI layer |
| ECC | Via MTD | Via UBI |
# Step 1: Attach MTD device to UBI
ubiattach /dev/ubi_ctrl -m 2 -d 0 # mtd2 → ubi0
# Step 2: Create UBI volume
ubimkvol /dev/ubi0 -n 0 -N rootfs -s 120MiB
# Step 3: Create UBIFS image
mkfs.ubifs \
-r $ROOTFS \
-o rootfs.ubifs \
-m 2048 \ # minimum I/O size (page size)
-e 126976 \ # logical erase block size (physical - 2*page for UBI overhead)
-c 1000 # maximum number of logical erase blocks
# Step 4: Package into a UBI image (for flashing)
ubinize -o ubi.img -m 2048 -p 131072 ubinize.cfg
# ubinize.cfg:
# [ubifs]
# mode=ubi
# image=rootfs.ubifs
# vol_id=0
# vol_size=120MiB
# vol_type=dynamic
# vol_name=rootfs
# vol_flags=autoresize
# Flash the UBI image
nandwrite -p /dev/mtd2 ubi.img
# Mount UBIFS
mount -t ubifs ubi0:rootfs /mnt
Filesystem Image Creation
{:.gc-adv}
Advanced
Creating ext2/ext3/ext4 Without Root (genext2fs)
# genext2fs creates ext2 images without root privileges or loop mounts
genext2fs \
-b 65536 \ # image size in 1K blocks (= 64 MB)
-d $ROOTFS \ # source directory
-i 4096 \ # bytes per inode
-U \ # squash uid/gid to 0
rootfs.ext2
# Convert to ext4
tune2fs -O extents,uninit_bg,dir_index rootfs.ext2
e2fsck -f rootfs.ext2
fakeroot — Correct Permissions Without Root
# fakeroot intercepts file ownership calls and tracks them in a database
# Allows creating images with root-owned files as a non-root user
fakeroot -- bash -c '
cp -a $ROOTFS /tmp/staging
chown -R root:root /tmp/staging
chmod 4755 /tmp/staging/bin/su
mksquashfs /tmp/staging rootfs.squashfs -comp zstd
'
Complete Image Creation Table
| Format | Command | Read-Write | Compression | Flash Type | Root Required |
|---|---|---|---|---|---|
| ext4 | mkfs.ext4 |
Yes | No | eMMC/SD/HDD | Yes (or loop) |
| ext2 (no root) | genext2fs |
Yes | No | eMMC/SD | No |
| SquashFS | mksquashfs |
No | Yes | Any | No (with fakeroot) |
| cpio initramfs | find | cpio |
RAM only | With gzip/xz | Any | No |
| JFFS2 | mkfs.jffs2 |
Yes | Yes | NOR/NAND MTD | No |
| UBIFS | mkfs.ubifs + ubinize |
Yes | Yes | NAND (UBI) | No |
| SquashFS+overlayfs | combined | Apparent yes | Partial | Any | Partial |
Buildroot Image Generation Pipeline
# Buildroot handles all of this automatically — shown here for understanding
# output/images/ contains the final artifacts:
ls output/images/
# rootfs.ext4 ← ext4 for eMMC targets
# rootfs.squashfs ← squashfs for read-only targets
# rootfs.tar.gz ← tarball for NFS or container base
# sdcard.img ← full disk image with partition table
Interview Q&A
{:.gc-iq}
Interview Q&A
Q1 — Why is SquashFS preferred for a production embedded rootfs?
SquashFS provides three key properties for production: (1) power-fail safety — because it is read-only, there is no risk of filesystem corruption from unexpected power loss; (2) atomic OTA — you replace or swap the entire image atomically rather than patching individual files; (3) compression — it reduces flash consumption and can improve read throughput on slow NAND by spending CPU cycles to decompress rather than waiting for I/O. The combination of overlayfs for a writable upper layer gives back the ability to modify files at runtime.
Q2 — In overlayfs, what is the difference between lowerdir and upperdir, and what happens on a write?
lowerdiris the read-only base layer (typically a SquashFS mount).upperdiris the read-write layer (typically a tmpfs or ext4 partition). When a file inlowerdiris written for the first time, overlayfs performs a copy-up: it copies the entire file fromlowerdirtoupperdir, then modifies the copy. Subsequent writes go directly toupperdir. Deletion is handled with a whiteout file inupperdirthat masks the lower-layer entry.
Q3 — What is the default size of a tmpfs mount if no size= option is given?
By default, tmpfs is limited to half of physical RAM. On a system with 512 MB RAM, an unconfigured tmpfs can grow to 256 MB. This is why
/etc/fstabentries for/tmp,/run, and similar should always specify an explicitsize=limit, especially on memory-constrained embedded systems.
Q4 — What are the three ext4 journal modes, and when would you use each?
data=journal— both data and metadata go through the journal. Safest, slowest. Rarely used.data=ordered— data is written to disk before its metadata is committed to the journal. Default mode. Good balance of safety and performance.data=writeback— metadata is journaled but data write-ordering is not guaranteed. Fastest, but data can be stale after a crash (though never corrupt). Preferred for embedded flash where write endurance matters more than ordered-write guarantees.
Q5 — When should you choose UBIFS over JFFS2?
UBIFS should be chosen for any raw NAND partition larger than ~64 MB. JFFS2’s mount time scales linearly with the number of nodes in the filesystem — on a 256 MB NAND partition it can take 30–60 seconds to mount at boot. UBIFS uses a B-tree index stored on the UBI volumes, so mount time is near-constant regardless of size. JFFS2 remains appropriate for small NOR flash chips (typically < 16 MB) where the UBI overhead is not worth the setup complexity.
Q6 — Why should embedded systems use noatime in their mount options?
Every file read updates the
atime(access time) field in the inode, which requires a write to flash. On a system that reads files frequently but does not need access time tracking (virtually all embedded applications), this causes pointless write amplification — shortening flash lifespan and reducing performance.noatimedisables these writes entirely.relatime(the Linux default since 2.6.30) is a compromise that only updates atime if it is older than mtime or once per day, butnoatimeis still preferred for flash-heavy embedded targets.
Q7 — How do you create a disk image with a partition table for an eMMC target?
# Create a 1 GB disk image with MBR partition table
dd if=/dev/zero of=sdcard.img bs=1M count=1024
# Partition: 64 MB boot (FAT32) + rest as rootfs (ext4)
parted -s sdcard.img \
mklabel msdos \
mkpart primary fat32 4MiB 68MiB \
mkpart primary ext4 68MiB 100% \
set 1 boot on
# Format partitions via loop device
sudo losetup -fP sdcard.img
LOOP=$(losetup -j sdcard.img | cut -d: -f1)
sudo mkfs.vfat -F32 -n BOOT ${LOOP}p1
sudo mkfs.ext4 -L rootfs ${LOOP}p2
# Copy content, then:
sudo losetup -d $LOOP
References
{:.gc-ref}
References
| Resource | Link |
|---|---|
| Linux kernel — SquashFS documentation | kernel.org/doc/html/latest/filesystems/squashfs.html |
| Linux kernel — overlayfs documentation | kernel.org/doc/html/latest/filesystems/overlayfs.html |
| Linux kernel — UBIFS documentation | kernel.org/doc/html/latest/filesystems/ubifs.html |
| MTD (Memory Technology Devices) | linux-mtd.infradead.org |
| mtd-utils project | git.infradead.org/mtd-utils.git |
| ext4 wiki | ext4.wiki.kernel.org |
| Buildroot filesystem generation | buildroot.org/downloads/manual/manual.html#_filesystem_images |
man 8 mkfs.ext4 |
ext4 filesystem creation |
man 8 mksquashfs |
SquashFS image creation |
man 8 mkfs.ubifs |
UBIFS image creation |
| Bootlin embedded Linux slides | bootlin.com/doc/training/embedded-linux/ |