The CPU and Memory Bus
How the CPU Actually Reaches Out and Touches RAM
The Problem
The CPU is fast. Absurdly fast.
A modern CPU executes instructions at 3–5 GHz. That’s 3–5 billion clock cycles per second. In one cycle, it can add two numbers, compare values, jump to a new instruction.
RAM is slow. Relatively speaking — glacially slow.
Getting a byte from RAM takes roughly 60–100 nanoseconds.
In that time, your 4GHz CPU has ticked through 240–400 clock cycles.
The CPU asked for a byte. The CPU waited. The CPU did nothing but wait for hundreds of cycles.
This gap — between CPU speed and RAM speed — is the central tension in all of computer architecture. Everything in Tier 0 is, in some way, about this gap.
But first: how does the CPU ask at all?
The Bus
Between the CPU and the RAM chips sits a physical communication channel — the memory bus.
“Bus” in hardware means: a shared set of wires carrying signals between components.
The memory bus has three logical parts:
┌─────────────┐ ┌─────────────┐
│ │───[ ADDRESS BUS ]──▶│ │
│ CPU │───[ DATA BUS ]───│ RAM │
│ │───[ CONTROL BUS ]──▶│ │
└─────────────┘ └─────────────┘
Address bus — CPU puts the address it wants on these wires. “I want byte number 0x7FFF1234.” These wires carry that number to the RAM. Unidirectional — CPU to RAM.
Data bus — the actual bytes travel here. On a read: RAM puts the data on these wires, CPU picks it up. On a write: CPU puts the data, RAM picks it up. Bidirectional.
Control bus — coordination signals. READ or WRITE? Is the data ready yet? Clock synchronization. The handshaking that makes the transfer happen correctly.
The Memory Controller
The CPU doesn’t talk to RAM directly. Between them sits the memory controller.
On older systems (pre-2000s roughly), the memory controller was a separate chip on the motherboard — the “northbridge.”
On modern systems — including your x86-64 machine — the memory controller is integrated directly into the CPU die.
This matters because: it reduced latency. The signal doesn’t have to travel off the CPU package to a separate chip and back. It stays on-die.
The memory controller’s job:
- Translate the address the CPU wants into the row/column signals the DRAM needs
- Handle the timing — DRAM has strict timing requirements (RAS, CAS latency — the row/column delays from 0.1)
- Manage refresh — constantly cycling through all rows to prevent charge decay
- Handle multiple RAM channels — modern CPUs have dual or quad channel memory controllers, running multiple buses in parallel for bandwidth
A Read, Step by Step
Your C program does this:
int x = *p;
At the hardware level:
1. CPU determines the physical address behind p. (Via the MMU — Tier 1. For now, assume it maps directly.)
2. CPU puts that address on the address bus.
3. CPU asserts READ on the control bus.
4. Memory controller receives the address. Decodes it into: which RAM chip, which row, which column.
5. Memory controller sends RAS — Row Address Strobe — the row address to the DRAM.
6. DRAM loads that row into its row buffer. This takes time. The memory controller waits.
7. Memory controller sends CAS — Column Address Strobe — the column address.
8. DRAM outputs the byte(s) from that column of the row buffer onto the data bus.
9. CPU reads the data bus. The value is now in a CPU register.
10. CPU executes the rest of the instruction.
Total elapsed time from step 2 to step 9: ~60–100 nanoseconds. ~200–400 CPU cycles of waiting.
A Write, Step by Step
*p = 42;
1. CPU puts the address on the address bus.
2. CPU puts the value (42) on the data bus.
3. CPU asserts WRITE on the control bus.
4. Memory controller routes it to the correct RAM chip, row, column.
5. The value is written into the capacitor at that location.
Writes have their own timing constraints — the DRAM must have the row loaded before the column can be written, same as reads.
Bus Width
The data bus isn’t 1 bit wide. It isn’t 8 bits (one byte) wide.
Modern memory buses are 64 bits wide — 8 bytes transferred per transaction.
But actually it’s wider than that in practice. DDR memory transfers on both the rising and falling edge of the clock (that’s what “Double Data Rate” means). And memory controllers typically access multiple chips in parallel.
The practical result: when your CPU reads one int (4 bytes) from RAM, the memory controller actually fetches a full cache line — 64 bytes — because the bus and DRAM geometry make it nearly free to grab the neighboring bytes at the same time.
This is why cache lines are 64 bytes (0.5). It’s not an arbitrary number. It matches the natural transaction size of the bus and DRAM row buffer.
The Speed Numbers, Concretely
OPERATION LATENCY CPU CYCLES (at 4GHz)
─────────────────────────────────────────────────────────────────────
CPU register access ~0.3 ns ~1
L1 cache hit ~1 ns ~4
L2 cache hit ~4 ns ~16
L3 cache hit ~40 ns ~160
RAM access (cache miss) ~80 ns ~320
SSD (NVMe) ~100,000 ns ~400,000
HDD ~10,000,000 ns ~40,000,000
Look at that table. Really look at it.
RAM is 320x slower than a register. Your SSD is 5,000x slower than RAM. Your spinning disk is another 100x slower than that.
Every time your code causes a cache miss — accessing memory that isn’t in the L1/L2/L3 cache — your CPU stalls for 320 cycles doing nothing.
Stuxnet’s code was carefully optimized to avoid detection partly by being fast and silent — not thrashing memory, not causing unusual access patterns that monitoring tools might catch. Understanding the memory hierarchy isn’t just academic. At that level, it’s operational.
What “Random Access” Actually Means
RAM stands for Random Access Memory.
“Random access” means: you can access any address in approximately the same time. Contrast with a tape drive, where you have to physically seek to the position.
This is mostly true — any address takes roughly the same 60–100ns.
But “approximately” is doing a lot of work there.
Accessing address 0x1000 then 0x1008 then 0x1010 — sequential — those might all be in the same DRAM row. The row buffer is warm. Each access after the first is faster.
Accessing address 0x1000 then 0xFF001000 then 0x50008000 — random jumps — each one hits a different row. Row buffer evicted, new row loaded each time. Full latency every time.
So: RAM is random access in the sense that any address is reachable. It is not random access in the sense that all patterns have equal cost.
This is why your data structures matter. A linked list chasing pointers across the heap — every next pointer potentially points to a completely different row. Cache miss after cache miss. An array — sequential addresses, same rows, cache warm.
At the level you want to operate: understanding this is the difference between code that runs and code that flies.
The Bigger Picture
┌──────────────────────────────────────────────────────────────┐
│ CPU DIE │
│ │
│ ┌──────────┐ ┌──────┐ ┌──────┐ ┌──────────────┐ │
│ │ CORES │────│ L1 │────│ L2 │────│ L3 │ │
│ │(registers│ │cache │ │cache │ │ cache │ │
│ │ inside) │ │ │ │ │ │ (shared) │ │
│ └──────────┘ └──────┘ └──────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────┴──────┐ │
│ │ MEMORY CONTROLLER│ │
│ └──────────┬──────┘ │
└─────────────────────────────────────────────────┼──────────┘
│
[MEMORY BUS]
│
┌──────────┴──────┐
│ RAM MODULES │
│ (DRAM chips) │
└─────────────────┘
The CPU, caches, and memory controller are all on the same die. The RAM is off-die, connected by the memory bus. That bus crossing is where the latency lives.
What You Own After 0.2
Memory bus = address + data + control wires between CPU and RAM
Memory controller = on-die, translates CPU requests to DRAM signals
Read = CPU puts address on bus, waits ~80ns, gets data back
Write = CPU puts address + data on bus, DRAM stores it
Bus width = 64 bits, but real transfers are 64-byte cache lines
Random access = any address reachable, but sequential is faster
The gap = CPU at 4GHz, RAM at ~80ns — 320 cycles of waiting
One sentence:
The CPU requests bytes by address over a physical bus, waits hundreds of cycles for the RAM to respond, and the entire architecture of caches and memory layout exists to hide that wait.