Building a Minimal 32-bit OS from Scratch: Bootloader, GDT, and a Bare-Metal C Kernel
Published on January 17, 2026 · Systems Programming, x86, NASM, C
I wanted to understand what actually happens between pressing the power button and seeing an OS shell. Every tutorial says "the BIOS loads your bootloader" — but what does that concretely mean in assembly? I built mini-os to answer that question from first principles: a complete x86 OS in NASM and C, no GRUB, no libc, no abstraction layer.
Stage 1 — The Boot Sector: 512 Bytes to Rule Them All
When a PC powers on, the BIOS scans for a bootable drive and loads the first 512 bytes — the boot sector — into memory at the fixed address 0x7C00. It then jumps to that address. You have exactly 510 usable bytes (the last two must be the magic signature 0xAA55), and you're running in 16-bit Real Mode with almost no hardware awareness.
The first thing my bootloader does is save the boot drive number that the BIOS puts in dl, then zero out all segment registers. Stack pointer is set to 0x7C00 — growing downward, safely away from our code. Then the disk read loop begins:
Reading sector by sector rather than in bulk lets me catch errors on each individual read via the carry flag. If any sector fails, the CPU halts. The kernel lands at 0x1000 — well clear of the BIOS data area and our own boot sector.
Stage 2 — The GDT and the Switch to Protected Mode
Real Mode's 16-bit segmented addressing caps you at 1MB of memory and gives you no memory protection, no privilege rings, nothing. To run 32-bit C code you must switch the CPU to Protected Mode. That requires loading a Global Descriptor Table (GDT).
The GDT tells the CPU how memory segments are laid out. I defined three entries: a mandatory null descriptor, a code segment, and a data segment — both using flat 4GB descriptors:
The descriptor bytes 0x00CF9A000000FFFF encode the access flags: present bit set, 32-bit segment, execute/read for code, with a granularity bit that scales the limit in 4KB pages (so 0xFFFFF × 4096 = 4GB). After loading the GDT with lgdt [gdt_descriptor], I set bit 0 of cr0 and do a far jump to flush the instruction pipeline:
Stage 3 — The C Kernel and VGA Direct Write
Inside protected_mode, I reload all segment registers to point at the data descriptor, set the stack to 0x90000, and call the address 0x1000 — which is where the kernel binary was loaded. Execution transfers to kernel_main() in C.
There's no printf, no heap, no syscalls. To display text I write directly to the VGA text-mode buffer at 0xB8000. Each character occupies two bytes — the ASCII value and a colour attribute byte. 0x0F means white text on black background.
The Linker Script and Cross-Compiler Toolchain
This is where most tutorials skip something important. You cannot compile kernel C code with your host system's gcc — it will link against glibc and produce position-dependent code targeting the host ABI. You need a bare-metal cross-compiler.
I built an i686-elf cross-compiler toolchain and used these flags:
The linker script (linker.ld) tells the linker to place the kernel at address 0x1000 and use kernel_main as the entry point — matching exactly where the bootloader jumps:
Finally, the Makefile concatenates the 512-byte boot sector binary and the flat kernel binary into one image file, which QEMU boots directly as a raw disk:
Seeing "Hello from Kernel" appear on a black screen — output that I generated by writing directly to hardware memory addresses, through assembly I wrote byte by byte — is a different kind of satisfaction than any higher-level project. Every byte in that output is accountable.