magic inside a "hello world"
what happens when you write and execute a hello world program ?
our program -
# hello.py
print("Hello, World!")and we execute
python3 hello.pysteps involved to create mind map -
python3 hello.py → photons in eyes
PHASE 1 — SHELL
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ You type: python3 hello.py + Enter
The shell (bash/zsh) is itself a running program. It reads your
keystrokes and now has a command string to act on.
│
▼
▸ Shell searches the PATH environment variable
PATH is a list of folders. Shell checks each one for a file named
"python3". Finds /usr/bin/python3. Stops.
│
▼
▸ Shell calls fork() — clones itself
fork() is a kernel syscall. It makes an exact copy of the shell
process. The copy (child) will become python3. Parent shell waits.
│
▼
▸ Child calls execve("/usr/bin/python3", argv, envp)
execve is a syscall that says "throw away my current memory and load
this file instead". The child stops being bash.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 2 — KERNEL LOADS THE BINARY (ELF format)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ Kernel reads the ELF header (first 64 bytes of python3 file)
ELF is the file format for Linux programs. The header is a map:
"code starts at byte X, data at byte Y, entry point at address Z".
│
▼
▸ Kernel maps sections into RAM
.text (machine code, read+execute), .data (globals, read+write),
.rodata (constants, read-only). Each section gets its own memory
pages.
│
▼
▸ Kernel hands off to ld-linux.so (the dynamic linker)
python3 doesn't contain all code itself. It says "I need libc.so.6,
libm.so…". ld-linux.so loads those .so files into RAM too.
│
▼
▸ ld-linux.so patches function pointers (relocation)
python3's call to write() is a blank slot. Linker fills it in:
"that slot now points to address 0x7f…44 in libc.so.6". Done once
at load time.
│
▼
▸ CPU jumps to start → _libc_start_main() → main()
_start is the real first instruction (not main). libc sets up the C
runtime (stack, argc/argv, atexit handlers), then calls CPython's
main().
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 3 — CPYTHON INITIALISES THE PYTHON RUNTIME
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ main() calls Py_InitializeFromConfig()
Sets up pymalloc (Python's own memory allocator), GIL (Global
Interpreter Lock — only one thread runs Python at a time),
interpreter state.
│
▼
▸ Built-in types and modules are created in memory
int, str, list, dict objects are born. sys, builtins, _io modules
load. site.py runs, adding site-packages to sys.path.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 4 — READ hello.py FROM DISK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ CPython calls fopen("hello.py") via libc
fopen() calls the open() syscall. Kernel opens the file and returns an fd — a file descriptor. fd is just a small integer (e.g. 5) that
acts as a handle. fd=0 stdin, fd=1 stdout, fd=2 stderr are always
reserved. Your file gets the next free number.
│
▼
▸ CPython calls fread() → libc calls read(5, buf, n)
libc's read() puts syscall number 0 in register rax, fd=5 in rdi,
buffer address in rsi, byte count in rdx. Then fires the syscall
instruction (bytes 0F 05).
│
▼
▸ Kernel sys_read() → VFS → ext4 → page cache check
VFS (Virtual File System) is an abstraction layer — same API
whether the file is on ext4, NTFS, or a network drive. Checks RAM
cache first.
│
▼
▸ Cache miss → NVMe driver → SSD → DMA into RAM
DMA (Direct Memory Access): the SSD controller writes data straight
into RAM without the CPU doing it byte-by-byte. CPU gets an
interrupt when done.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 5 — LEX → PARSE → COMPILE TO BYTECODE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ Lexer scans the raw text character by character
Breaks source into tokens: NAME "print", LPAR "(", STRING "Hello,
World!", RPAR ")". Like splitting a sentence into individual words.
│
▼
▸ Parser builds an AST from the tokens
AST is a tree representing the meaning of your code: "a Call node,
whose func is Name('print'), whose args contain Constant('Hello…')".
│
▼
▸ Compiler walks the AST and emits bytecode
Bytecode is NOT machine code. It's instructions for the Python VM: PUSH_NULL, LOAD_GLOBAL print, LOAD_CONST 'Hello…', CALL 1. Saved to .pyc.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 6 — CPYTHON EVAL LOOP EXECUTES BYTECODE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ ceval.c: giant C for-loop reads one bytecode instruction at a time. It's a switch statement. Each case handles one opcode. Runs a value stack — instructions push/pop Python objects onto it like a stack of plates.
│
▼
▸ LOAD_GLOBAL 'print' — dict lookup through 3 scopes
Python checks: locals dict → globals dict → builtins dict. Finds
built-in print function in builtins. Pushes a pointer to it onto
the value stack.
│
▼
▸ LOAD_CONST — pushes the string object pointer onto the stack
The string "Hello, World!" already exists as a Python object in the
code's constants pool. Just push a pointer — no copying.
│
▼
▸ CALL 1 — pops fn + arg, calls builtin_print() in C
print is a C function inside CPython (Python/bltinmodule.c). The
eval loop hands control directly to that C function.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 7 — PYTHON IO STACK → libc → SYSCALL INSTRUCTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ builtin_print() calls sys.stdout.write("Hello, World!\n")
sys.stdout is a Python TextIOWrapper object. Think of it as a smart
pipe that knows about text encoding and buffering. It wraps fd=1
(stdout).
│
▼
▸ TextIOWrapper encodes the Python str to bytes (UTF-8)
Python strings are Unicode internally. The CPU can only send raw
bytes. UTF-8 converts each character to 1–4 bytes. "H" → 0x48,
"e" → 0x65…
│
▼
▸ BufferedWriter batches bytes, flushes to FileIO (holds fd=1)
Syscalls are slow. Buffering collects many small writes into one
big syscall. For \n at the end, the buffer flushes immediately
(line buffering).
│
▼
▸ FileIO → libc write(1, buf, 14) → assembly → syscall
libc's write() is ~4 assembly lines:
mov rax, 1 ; syscall number for write
mov rdi, 1 ; fd = stdout
mov rsi, bufaddr ; pointer to "Hello, World!\n"
mov rdx, 14 ; byte count
syscall ; assembled to bytes 0F 05│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 8 — INSIDE THE CPU: HOW SYSCALL EXECUTES IN HARDWARE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ FETCH — CPU loads the next instruction bytes from RAM
The RIP register (instruction pointer) holds the address of the
next instruction. CPU's fetch unit grabs the bytes from L1 cache
(a tiny, ultra-fast memory right next to the cores). This happens
every clock tick — ~3 billion times per second.
│
▼
▸ DECODE — decoder recognises bit pattern 0F 05 as "syscall"
The decoder is hardwired transistor logic — not software. It
pattern-matches the opcode against thousands of known instructions.
The exact transistors that recognise 0F 05 are physically etched
into the CPU die. Result: "this is the syscall instruction".
│
▼
▸ DISPATCH — control unit triggers the syscall microcode handler
Complex instructions like syscall aren't a single hardware action —
they're orchestrated by microcode, a tiny program baked into the
CPU itself. The control unit hands off to that microcode, which
performs the next several steps automatically (no software).
│
▼
▸ Hardware switches privilege: Ring 3 → Ring 0
CPUs have 4 privilege rings (0–3). User programs run in Ring 3
(restricted). The kernel runs in Ring 0 (full access). The ring
level is stored in the CS register. Transistors check this register
before every sensitive instruction. "syscall" forces a jump to
Ring 0 — hardware enforced, not software.
│
▼
▸ CPU reads the LSTAR register to find the kernel entry point
LSTAR is a special CPU register. At boot, the kernel writes its own
syscall handler address into LSTAR. The "syscall" instruction
automatically jumps there. Only Ring 0 code can write to LSTAR —
so only the kernel can set this. User programs can't hijack it.
│
▼
▸ CPU saves registers and instruction pointer automatically
Before jumping, CPU snapshots: current instruction pointer (where
to return), stack pointer, CPU flags. Stored so execution can
resume after the syscall finishes.
│
▼
▸ Kernel entry point reads rax=1 → syscall table lookup
The kernel has an array: syscall_table[0]=sys_read, [1]=sys_write,
[60]=sys_exit… rax holds the index. Kernel calls
syscall_table[1] = sys_write().
│
▼
▸ Kernel finishes → sysret instruction → Ring 0 back to Ring 3
sysret restores the saved registers and instruction pointer. CPU
is back in Ring 3. Result (bytes written = 14) is in rax. libc's
write() returns it.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 9 — KERNEL ROUTES BYTES TO THE TERMINAL
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ sys_write() → VFS → PTY driver
fd=1 (stdout) doesn't point to your screen. It points to one end
of a PTY (pseudo-terminal) — a kernel pipe with two ends. Your
program writes to one end. The terminal emulator (gnome-terminal,
iTerm2…) reads from the other. This is why you can redirect stdout
to a file — it bypasses the PTY entirely.
│
▼
▸ Terminal emulator reads bytes from the PTY
It's a normal user-space program. It reads "Hello, World!\n" byte
by byte, interprets control codes (like \n = move cursor down),
then renders text.
│
▼
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PHASE 10 — DISPLAY SIGNAL FLOW: FONT → GPU → CABLE → LCD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
▸ Terminal looks up each character in the loaded font file
A font file (TTF/OTF) stores each character as a vector outline
(curves, not pixels). For "H", it fetches the outline and
rasterises it at the current font size into a pixel grid.
│
▼
▸ Pixel data written to the GPU framebuffer (VRAM)
The framebuffer is a big array in GPU memory — one entry per pixel
on screen. Terminal writes the glyph pixels at the current cursor
position's coordinates.
│
▼
▸ GPU display engine scans the framebuffer ~60 times per second
The GPU has dedicated display hardware that reads VRAM row by row,
left to right, top to bottom (scanout). It does this continuously
— 60 Hz = every 16ms.
│
▼
▸ GPU encodes pixels as a digital signal → HDMI / DisplayPort cable
HDMI/DP sends pixel data as high-speed serial bits (TMDS encoding).
Each pixel = RGB values. The cable carries this as rapidly
alternating voltages — billions per second.
│
▼
▸ Monitor receives signal → display driver IC → LCD panel
Monitor's controller chip decodes the serial signal back into
pixel RGB values. For each pixel it applies a voltage to tiny
liquid crystals. The crystals twist more or less depending on
voltage, blocking or passing backlight through a colour filter.
More voltage = more twist = different colour.
│
▼
▸ Backlight photons pass through the LCD → reach your eyes
Your retina's cone cells detect the wavelengths. Brain interprets
the pattern of light/dark pixels as letters. You read:
Hello, World!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LEGEND
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1–2 : Shell + kernel loader
Phase 3 : CPython runtime init
Phase 4 : Disk read pipeline
Phase 5–6 : Python parsing + bytecode VM
Phase 7 : Python IO / libc / syscall setup
Phase 8 : CPU hardware internals
Phase 9 : Kernel → terminal routing
Phase 10 : Display pipeline (GPU → screen)