Registers
The Paradox
Everything in RAM has an address.
Registers don’t.
You cannot take the address of a register in C. You cannot point a pointer at one. The & operator doesn’t work on them. They exist entirely outside the addressable memory space.
And yet — registers are where all computation actually happens.
Not RAM. Registers.
The CPU cannot add two numbers that are sitting in RAM. It cannot compare them, shift them, OR them. It must first load them into registers. The operation happens in the register. The result lives in a register. Only then — maybe — does it get written back to RAM.
Every line of C you will ever write, at the bottom, is the compiler figuring out how to choreograph data moving between RAM and registers so that the actual work can happen.
What a Register Physically Is
A register is built from flip-flops.
A flip-flop is a circuit with two stable states — 0 or 1 — that holds its state as long as power is applied, without refreshing. Unlike a DRAM capacitor, it doesn’t leak. It doesn’t need to be periodically rewritten.
It’s made of transistors arranged in a feedback loop — the output feeds back to hold the input stable. Typically 6 transistors per bit for SRAM-style storage. Fast. Stable. Expensive in silicon area.
A 64-bit register is 64 of these flip-flops, sitting physically inside the CPU core, a few nanometers from the arithmetic units that operate on them.
Access time: less than one clock cycle. The data is essentially already there.
The Register File
A CPU core doesn’t have one register. It has a register file — a small, structured set of registers.
On x86-64 — your CPU’s architecture — the programmer-visible general purpose registers are:
RAX RBX RCX RDX
RSI RDI RSP RBP
R8 R9 R10 R11
R12 R13 R14 R15
16 general purpose registers. Each is 64 bits — 8 bytes wide.
Plus:
- RIP — the instruction pointer. Points to the next instruction to execute. You can’t directly write it (well — you can, via jumps and calls).
- RFLAGS — status flags. Zero flag, carry flag, sign flag, overflow flag. Set by arithmetic operations, read by conditional jumps.
- XMM0–XMM15 — 128-bit SSE registers for floating point and SIMD.
- YMM0–YMM15 — 256-bit AVX registers (the lower 128 bits are the XMM registers).
- Various segment registers, control registers, debug registers — mostly kernel territory.
The Naming Hierarchy — This Is Important
x86-64 has backwards compatibility going back to 1978. The registers have sub-register names that reflect this history.
Take RAX:
RAX = 64-bit [bits 63 ────────────────────── 0]
EAX = 32-bit [ bits 31 ── 0]
AX = 16-bit [ 15 ── 0]
AH = 8-bit [ 15 ── 8] (high byte of AX)
AL = 8-bit [ 7 ── 0] (low byte of AX)
Same physical register. Different width views into it.
Writing to EAX zero-extends into RAX — the upper 32 bits are cleared. This was a deliberate x86-64 design decision.
Writing to AX — only the lower 16 bits change. Upper 48 bits untouched.
Writing to AH or AL — only that byte changes.
This matters for exploit development. A function returns a value in RAX. If the function was compiled to write EAX, the upper 32 bits of RAX are zeroed. If it wrote AX, they’re not. Getting this wrong when reading return values causes subtle, catastrophic bugs.
The same sub-register naming applies:
RBX / EBX / BX / BH / BL
RCX / ECX / CX / CH / CL
RDX / EDX / DX / DH / DL
RSI / ESI / SI / SIL
RDI / EDI / DI / DIL
RSP / ESP / SP / SPL
RBP / EBP / BP / BPL
R8 / R8D / R8W / R8B
R9 / R9D / R9W / R9B
... (R10 through R15 same pattern)
What Each Register Is Used For — Calling Convention
Registers aren’t just generic storage. The x86-64 System V ABI (the calling convention your Linux system uses) assigns roles to them:
REGISTER ROLE CALLER/CALLEE SAVED?
────────────────────────────────────────────────────────────
RAX Return value / scratch Caller-saved
RBX Callee-saved general purpose Callee-saved
RCX 4th argument Caller-saved
RDX 3rd argument Caller-saved
RSI 2nd argument Caller-saved
RDI 1st argument Caller-saved
RSP Stack pointer Callee-saved (special)
RBP Frame pointer Callee-saved
R8 5th argument Caller-saved
R9 6th argument Caller-saved
R10 Scratch / static chain Caller-saved
R11 Scratch Caller-saved
R12–R15 Callee-saved general purpose Callee-saved
RIP Instruction pointer Not directly writable
RFLAGS Condition codes Caller-saved
Caller-saved means: if you (the caller) care about the value in that register after making a function call, you must save it before the call. The callee is free to trash it.
Callee-saved means: if the function (callee) uses that register, it must save and restore it. When the function returns, those registers must have the same values they had when the function was entered.
When you call printf("hello") — the string address goes into RDI. That’s not a C concept. That’s metal. The actual instruction mov rdi, <address> happens before the call printf instruction.
When malloc returns a pointer — it’s in RAX. *p = malloc(...) is the compiler emitting mov [address_of_p], rax.
Why No Address
Registers are not part of the addressable memory space because they are not memory.
They’re not on the memory bus. They’re not in the RAM chips. They’re not behind the memory controller. They’re inside the CPU core itself, wired directly to the ALU (Arithmetic Logic Unit), the load/store units, the instruction decoders.
The CPU reaches a register by hardwired logic — specific bits in the instruction encoding specify which register. The instruction add rax, rbx is encoded as specific bytes where certain bit fields say “source: register 0 (rax), operand: register 3 (rbx).” The silicon reads those bits and routes the register file outputs directly to the adder inputs.
No address. No bus transaction. No latency. It just happens in the combinational logic.
This is why register access is measured in fractions of a nanosecond while RAM is 80ns. They aren’t even the same class of thing.
The Compiler’s Job — Register Allocation
You write:
int a = 1;
int b = 2;
int c = a + b;
int d = c * 3;
The compiler’s job — specifically the register allocator pass — is to figure out which variables live in which registers, when, and when to spill them to the stack because you ran out of registers.
A smart register allocation for the above:
mov eax, 1 ; a → EAX
mov ecx, 2 ; b → ECX
add eax, ecx ; EAX = a + b = 3, this is c
lea edx, [eax + eax*2] ; EDX = c * 3 = 9, this is d
Notice: a, b, c, d never touched RAM. They lived entirely in registers. No memory access at all.
This is what -O2 optimization is largely doing — keeping values in registers as long as possible instead of constantly writing and reading back from the stack.
The unoptimized version (like -O0, used for debugging) religiously writes every variable to the stack and reads it back, so the debugger can always find them at known stack addresses. This is 3–5x slower. That’s the cost of debuggability.
Register Spilling
16 general-purpose registers. Your function might have 40 local variables.
When the register allocator runs out of registers, it spills — writes a register’s current value to a stack slot, frees the register for something else, reloads later when needed.
Spills are expensive. They’re the register allocator admitting defeat and going to RAM (well, stack — which is in RAM — which means it’ll likely hit L1 cache, but still).
Good C code, good compiler flags, and good data structure design minimize spills. Functions with fewer live variables at any one time, fewer arguments, simpler control flow — these allow the compiler to keep more in registers.
The Programmer-Invisible Registers
Modern CPUs have far more registers than you can see.
Out-of-order execution requires this. The CPU renames the architectural registers (RAX, RBX…) to a much larger pool of physical registers — x86-64 chips typically have 168–512+ physical registers internally.
This is register renaming. When two instructions both want to write RAX, the CPU assigns them to different physical registers and tracks which version is “the real RAX” at any point. This lets the CPU execute those instructions in parallel even though they appear to write the same register.
You never see these. The compiler doesn’t see these. The OS doesn’t see these. They exist entirely within the CPU microarchitecture — the hardware-level implementation that sits below even the ISA (Instruction Set Architecture).
Spectre and Meltdown exploited timing effects caused by this speculative, out-of-order execution engine. Side channels in the physical register file’s timing exposed secrets from memory that the program technically never had permission to read.
That is what operating at this level looks like.
Context Switching — The Register Save Problem
Your OS runs many processes. The CPU has one set of registers.
When the kernel switches from process A to process B — it must save all of process A’s registers somewhere, and load all of process B’s registers that were saved last time B was running.
This saved register state is the context — stored in a kernel data structure per thread. On x86-64 that’s all 16 general purpose registers, RIP, RFLAGS, the segment registers, and the FPU/SSE/AVX state.
The AVX-512 register file alone is 32 registers × 64 bytes = 2048 bytes that must be saved and restored on every context switch. This is why enabling AVX-512 can increase context switch overhead.
When an exploit does a privilege escalation — when it gets ring 3 code running as ring 0 — part of what it’s manipulating is this register save/restore mechanism. The kernel trusts the saved register state. An attacker who controls the saved state controls what registers contain when execution resumes.
The RIP Register — Control Flow Is Just a Register
The instruction pointer — RIP — contains the address of the next instruction to execute.
The CPU fetch-decode-execute cycle:
1. Read memory at address in RIP → fetch instruction bytes
2. Decode those bytes → determine what operation and operands
3. Execute the operation
4. Advance RIP to next instruction (or jump sets RIP directly)
5. Repeat
Execution is just: RIP advances. A jump is just: RIP gets set to a new value. A function call is: push current RIP onto stack, set RIP to function address. A return is: pop saved RIP from stack, set RIP to it.
All control flow is register manipulation.
A buffer overflow that overwrites the return address on the stack — it’s overwriting the value that will be popped into RIP. When ret executes, it pops your value into RIP, and the CPU starts fetching instructions from wherever you pointed.
That’s not a metaphor. That’s literally what happens. The CPU has no opinion about it. It executes whatever RIP points to.
Summary — What You Own After 0.3
Register = flip-flop array, inside CPU core, no address, sub-nanosecond
16 GPRs = RAX RBX RCX RDX RSI RDI RSP RBP R8–R15
Sub-registers = EAX/AX/AH/AL — different width views of same register
Calling conv = RDI RSI RDX RCX R8 R9 for args, RAX for return
Callee-saved = RBX RBP R12–R15 — function must preserve
Caller-saved = rest — caller must save if it cares
RIP = instruction pointer — control flow is just this register
Register rename = CPU has 168+ physical regs behind the 16 you see
Context switch = OS saves/restores all registers between processes
Spill = register allocator evicting to stack when regs exhausted
One sentence:
Registers are the only place computation actually happens — tiny, impossibly fast, nameless storage wired directly to the CPU’s arithmetic logic, invisible to the address space, and the ultimate target of everything an attacker wants to control.
0.3 Extended — Register Conventions
The Full Contract Between Every Function That Ever Calls Another
Why Conventions Exist At All
The CPU doesn’t enforce any of this.
You could write a program where function arguments go in R15, R14, R13. The CPU doesn’t care. It executes whatever instructions you give it.
The calling convention is a social contract — agreed upon by compiler writers, OS designers, and library authors so that code compiled by different compilers, from different languages, can call each other without negotiating every time.
When your C code calls printf from glibc — your compiler and the glibc compiler never met. They agreed on the convention. That’s the only reason it works.
On Linux x86-64 the convention is called System V AMD64 ABI. On Windows x86-64 it’s different — Microsoft defined their own. This is why Windows and Linux binaries are incompatible at the ABI level even on identical hardware.
The Full Register Map — Burned Into Memory
REG ALT NAME ROLE SAVED BY
──────────────────────────────────────────────────────────────
RAX — Return value (int/ptr) Caller
RBX — General purpose Callee
RCX 4th arg 4th integer argument Caller
RDX 3rd arg 3rd integer argument Caller
RSI 2nd arg 2nd integer argument Caller
RDI 1st arg 1st integer argument Caller
RSP sp Stack pointer Callee (special)
RBP fp/bp Frame pointer (base pointer) Callee
R8 5th arg 5th integer argument Caller
R9 6th arg 6th integer argument Caller
R10 — Scratch / static chain ptr Caller
R11 — Scratch Caller
R12 — General purpose Callee
R13 — General purpose Callee
R14 — General purpose Callee
R15 — General purpose Callee
RIP pc Instruction pointer N/A
RFLAGS — Condition codes Caller
XMM0–XMM7 Float/SSE args 1–8, XMM0 = float return value Caller
XMM8–XMM15 Scratch float registers Caller
Caller-Saved vs Callee-Saved — The Exact Mental Model
Imagine you are writing a function. You’re mid-computation. You have a value in RAX you spent 20 instructions computing. You need to call malloc.
Does malloc promise to leave your RAX alone?
No. RAX is caller-saved. malloc will absolutely write its return value into RAX. Your value is gone.
If you need that value after the call — you (the caller) must push it to the stack before the call and pop it back after.
; You computed something precious, it's in RAX
push rax ; save it — your responsibility
mov rdi, 64 ; argument to malloc
call malloc ; RAX now = pointer malloc returned
pop rbx ; restore your precious value
; (into RBX, not RAX — RAX is the malloc result)
Now imagine you are writing a library function. You need RBX for your own work. RBX is callee-saved. The caller may have something precious in RBX and is trusting you not to destroy it.
You (the callee) must save RBX at your function entry and restore it before returning.
my_function:
push rbx ; save caller's RBX — your responsibility
; ... use RBX freely for your own work ...
pop rbx ; restore it
ret
The CPU doesn’t enforce this. If you trash RBX without restoring it, the CPU executes happily. The caller will just find garbage where their value used to be, and the resulting bug will be spectacular and nearly impossible to diagnose.
Argument Passing — The Exact Order
Integer and pointer arguments, left to right:
1st arg → RDI
2nd arg → RSI
3rd arg → RDX
4th arg → RCX
5th arg → R8
6th arg → R9
7th+ args → pushed on stack, right to left
Concrete example:
ssize_t write(int fd, const void *buf, size_t count);
When you call write(1, "hello", 5):
RDI = 1 (fd)
RSI = address of "hello" (buf)
RDX = 5 (count)
Then call write.
Another:
int something(int a, int b, int c, int d, int e, int f, int g);
RDI = a
RSI = b
RDX = c
RCX = d
R8 = e
R9 = f
[RSP+8] = g (on stack, above return address)
The Stack Arguments — Exactly How
When arguments spill to the stack, the caller pushes them before the call. The stack looks like this at the moment of call:
HIGH ADDRESS
┌─────────────────┐
│ arg 9 │ ← RSP + 24 (if 9+ args)
├─────────────────┤
│ arg 8 │ ← RSP + 16
├─────────────────┤
│ arg 7 │ ← RSP + 8
├─────────────────┤
│ return address │ ← RSP (pushed by CALL instruction)
└─────────────────┘
LOW ADDRESS
Inside the called function, if it uses a frame pointer:
HIGH ADDRESS
┌─────────────────┐
│ arg 9 │ ← RBP + 24
├─────────────────┤
│ arg 8 │ ← RBP + 16
├─────────────────┤
│ arg 7 │ ← RBP + 8
├─────────────────┤
│ return address │ ← RBP - 0 (wait — actually:)
├─────────────────┤
│ saved RBP │ ← RBP (the prologue pushed caller's RBP)
├─────────────────┤
│ local vars │ ← RBP - 8, RBP - 16 ...
└─────────────────┘
LOW ADDRESS
The ABI requires the stack to be 16-byte aligned at the point of CALL. This means RSP must be divisible by 16 before pushing the return address. The CALL instruction pushes 8 bytes (the return address), making RSP 8-byte aligned. The callee’s prologue typically pushes RBP (another 8 bytes), restoring 16-byte alignment. This matters for SSE/AVX instructions that require aligned memory — they’ll fault if the stack isn’t aligned when they execute.
Return Values — Every Case
TYPE WHERE RETURNED
──────────────────────────────────────────────────────
int, long, pointer, size_t RAX
Second 64-bit word (128-bit) RDX (RAX:RDX pair)
float, double XMM0
Long double (80-bit) ST(0) — x87 FPU stack
Struct ≤ 16 bytes RAX + RDX (packed in)
Struct > 16 bytes Caller allocates space,
passes address as hidden
first argument in RDI,
function fills it, returns
that address in RAX
The struct return case is subtle and important. If you write:
typedef struct { long a; long b; long c; } Big;
Big get_big(void);
The compiled call looks like this under the hood:
// What the compiler actually generates, conceptually:
Big result; // caller allocates on stack
get_big_hidden(&result); // passes hidden pointer in RDI
// get_big writes into *hidden_ptr
// returns the pointer in RAX (which caller ignores or uses)
This is why returning large structs by value in C isn’t “free” — the caller is silently allocating stack space and passing a hidden pointer. You won’t see this in the C source. You will see it in the disassembly.
The Function Prologue — Exactly What Happens
Every standard function begins with a prologue that sets up its stack frame:
push rbp ; save caller's frame pointer (callee-saved)
mov rbp, rsp ; RBP now points to our frame base
sub rsp, N ; allocate N bytes for local variables
And ends with an epilogue:
mov rsp, rbp ; restore stack pointer (undo local allocation)
pop rbp ; restore caller's frame pointer
ret ; pop return address into RIP, jump there
Or equivalently using the leave instruction:
leave ; does: mov rsp, rbp; pop rbp
ret
With -O2 and no frame pointer (-fomit-frame-pointer), RBP is freed as a general register and the prologue disappears or becomes minimal. Stack unwinding then uses DWARF .eh_frame metadata instead. Backtraces still work but RBP doesn’t serve as anchor anymore.
Seeing It For Real
Take this C:
int add(int a, int b) {
return a + b;
}
int main(void) {
int x = add(3, 7);
return x;
}
Compile: gcc -O0 -S -o out.s file.c
You’ll see something like:
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi ; spill arg a to stack
mov DWORD PTR [rbp-8], esi ; spill arg b to stack
mov edx, DWORD PTR [rbp-4] ; reload a
mov eax, DWORD PTR [rbp-8] ; reload b
add eax, edx ; eax = a + b
pop rbp
ret ; return value in EAX
main:
push rbp
mov rbp, rsp
sub rsp, 16 ; allocate 16 bytes for locals
mov esi, 7 ; 2nd arg → ESI
mov edi, 3 ; 1st arg → EDI
call add
mov DWORD PTR [rbp-4], eax ; store return value (x)
mov eax, DWORD PTR [rbp-4] ; load x into return reg
leave
ret
At -O2:
add:
lea eax, [rdi + rsi] ; eax = a + b directly from arg regs
ret
main:
mov eax, 10 ; compiler computed 3+7=10 at compile time
ret
The -O0 version is the calling convention made visible. Every arg explicitly spilled and reloaded. Every local at a known stack address. Painfully inefficient — but debuggable.
The -O2 version is the calling convention optimized away. Arguments never touch the stack. The result is a compile-time constant. The function call itself disappears.
Variadic Functions — How printf Works
printf(const char *fmt, ...) takes variable arguments. How does it know where they are?
The ABI has rules for this too.
Integer/pointer variadic args go into the same registers as normal args, in order. If fmt is the first arg (RDI), then the first variadic arg is RSI, then RDX, etc.
But here’s the catch: printf doesn’t know at compile time how many args there are. It reads fmt to find out how many %d, %s etc. are in the format string.
It then reads those args from registers — but it can only do this because the caller saved them to a known location on the stack at function entry. This is the register save area — in a variadic function prologue, all potential argument registers (RDI, RSI, RDX, RCX, R8, R9) are dumped to a contiguous block on the stack so that va_arg can walk through them with a simple pointer.
A format string vulnerability is: you call printf(user_input). No format arguments. But printf thinks it has arguments — whatever garbage is in RSI, RDX, RCX, R8, R9, and then whatever is on the stack above the frame. %x reads and prints those. %n writes a count to the address in the corresponding argument register.
The attacker controls the format string. They control what gets read. They control what gets written. The register convention is what makes the corruption predictable — they know exactly which register or stack slot corresponds to which %x in the format string.
Windows x64 ABI — The Differences
If you ever read Windows exploit code or reverse Windows binaries:
1st arg → RCX (not RDI)
2nd arg → RDX (not RSI)
3rd arg → R8 (not RDX)
4th arg → R9 (not RCX)
5th+ → stack
And critically: Windows requires the caller to allocate 32 bytes of shadow space on the stack before every call — even if the callee uses no stack args. It’s reserved space the callee can use to spill its register args if it wants. The callee doesn’t have to use it, but the space must be there.
This is why Windows shellcode and Linux shellcode are different even on the same CPU. Same ISA. Different ABI. Different stack layout. Different argument registers.
Summary — The Full Contract
ARG ORDER RDI RSI RDX RCX R8 R9 → stack (right to left)
RETURN RAX (int/ptr), XMM0 (float), RAX+RDX (large int)
CALLEE-SAVED RBX RBP R12 R13 R14 R15 RSP
CALLER-SAVED RAX RCX RDX RSI RDI R8 R9 R10 R11 RFLAGS XMM0-15
PROLOGUE push rbp / mov rbp,rsp / sub rsp,N
EPILOGUE leave / ret or mov rsp,rbp / pop rbp / ret
ALIGNMENT RSP must be 16-byte aligned before CALL
WINDOWS DIFF RCX RDX R8 R9 for args + 32-byte shadow space
One sentence:
The calling convention is the invisible contract that lets separately compiled code interoperate — it specifies exactly which register holds each argument, who saves what, where the return value appears, and how the stack is shaped at every call — and every exploit that touches function calls is exploiting a deviation from or manipulation of this contract.