← Sunil Khorwal

Building a Minimal 32-bit OS from Scratch: Bootloader, GDT, and a Bare-Metal C Kernel

Published on January 17, 2026  ·  Systems Programming, x86, NASM, C

I wanted to understand what actually happens between pressing the power button and seeing an OS shell. Every tutorial says "the BIOS loads your bootloader" — but what does that concretely mean in assembly? I built mini-os to answer that question from first principles: a complete x86 OS in NASM and C, no GRUB, no libc, no abstraction layer.


Stage 1 — The Boot Sector: 512 Bytes to Rule Them All

When a PC powers on, the BIOS scans for a bootable drive and loads the first 512 bytes — the boot sector — into memory at the fixed address 0x7C00. It then jumps to that address. You have exactly 510 usable bytes (the last two must be the magic signature 0xAA55), and you're running in 16-bit Real Mode with almost no hardware awareness.

The first thing my bootloader does is save the boot drive number that the BIOS puts in dl, then zero out all segment registers. Stack pointer is set to 0x7C00 — growing downward, safely away from our code. Then the disk read loop begins:

boot.asm — Disk load loop using BIOS int 0x13
mov bx, 0x1000 ; destination address for kernel mov dh, 5 ; read 5 sectors mov dl, [BOOT_DRIVE] mov cl, 2 ; start from sector 2 (sector 1 = us) .load_loop: mov ah, 0x02 ; BIOS read sector function mov al, 1 ; read one sector at a time mov ch, 0 ; cylinder 0 int 0x13 ; call BIOS jc disk_error ; carry flag set = read failed add bx, 512 ; advance destination by one sector inc cl dec dh jnz .load_loop

Reading sector by sector rather than in bulk lets me catch errors on each individual read via the carry flag. If any sector fails, the CPU halts. The kernel lands at 0x1000 — well clear of the BIOS data area and our own boot sector.


Stage 2 — The GDT and the Switch to Protected Mode

Real Mode's 16-bit segmented addressing caps you at 1MB of memory and gives you no memory protection, no privilege rings, nothing. To run 32-bit C code you must switch the CPU to Protected Mode. That requires loading a Global Descriptor Table (GDT).

The GDT tells the CPU how memory segments are laid out. I defined three entries: a mandatory null descriptor, a code segment, and a data segment — both using flat 4GB descriptors:

boot.asm — GDT definitions
gdt_start: dq 0 ; null descriptor (required) gdt_code: dq 0x00CF9A000000FFFF ; 32-bit code, base=0, limit=4GB, ring 0 gdt_data: dq 0x00CF92000000FFFF ; 32-bit data, base=0, limit=4GB, ring 0 gdt_end: gdt_descriptor: dw gdt_end - gdt_start - 1 ; GDT size minus 1 dd gdt_start ; linear address of GDT

The descriptor bytes 0x00CF9A000000FFFF encode the access flags: present bit set, 32-bit segment, execute/read for code, with a granularity bit that scales the limit in 4KB pages (so 0xFFFFF × 4096 = 4GB). After loading the GDT with lgdt [gdt_descriptor], I set bit 0 of cr0 and do a far jump to flush the instruction pipeline:

boot.asm — Entering Protected Mode
cli ; disable interrupts lgdt [gdt_descriptor] ; load our GDT mov eax, cr0 or eax, 1 mov cr0, eax jmp CODE_SEG:protected_mode ; far jump — flushes pipeline

Stage 3 — The C Kernel and VGA Direct Write

Inside protected_mode, I reload all segment registers to point at the data descriptor, set the stack to 0x90000, and call the address 0x1000 — which is where the kernel binary was loaded. Execution transfers to kernel_main() in C.

There's no printf, no heap, no syscalls. To display text I write directly to the VGA text-mode buffer at 0xB8000. Each character occupies two bytes — the ASCII value and a colour attribute byte. 0x0F means white text on black background.

kernel.c — Full kernel source
void kernel_main() { char* vga = (char*)0xB8000; const char* msg = "Hello from Kernel"; for (int i = 0; msg[i]; i++) { vga[i * 2] = msg[i]; // ASCII character vga[i * 2 + 1] = 0x0F; // white on black } while (1) { __asm__("hlt"); } // halt until next interrupt (none) }

The Linker Script and Cross-Compiler Toolchain

This is where most tutorials skip something important. You cannot compile kernel C code with your host system's gcc — it will link against glibc and produce position-dependent code targeting the host ABI. You need a bare-metal cross-compiler.

I built an i686-elf cross-compiler toolchain and used these flags:

Makefile — Cross-compilation flags
CC=i686-elf-gcc LD=i686-elf-ld CFLAGS=-ffreestanding -m32 -fno-pie -nostdlib # -ffreestanding: no standard library, no startup files # -m32: target 32-bit i686, not host architecture # -nostdlib: don't link libc or libgcc automatically # -fno-pie: disable position-independent executable

The linker script (linker.ld) tells the linker to place the kernel at address 0x1000 and use kernel_main as the entry point — matching exactly where the bootloader jumps:

linker.ld
ENTRY(kernel_main) SECTIONS { . = 0x1000; .text : { *(.text*) } .rodata : { *(.rodata*) } .data : { *(.data*) } .bss : { *(.bss*) } }

Finally, the Makefile concatenates the 512-byte boot sector binary and the flat kernel binary into one image file, which QEMU boots directly as a raw disk:

Building and running
cat boot/boot.bin kernel/kernel.bin > os-image.bin qemu-system-i386 -drive file=os-image.bin,format=raw

Seeing "Hello from Kernel" appear on a black screen — output that I generated by writing directly to hardware memory addresses, through assembly I wrote byte by byte — is a different kind of satisfaction than any higher-level project. Every byte in that output is accountable.