Basics Master - Part 1

The Type System

Before getting to know about the C type system, we need to understand what problem the C's type system was solving. The roots of this problem dates back to the beginning of the UNIX operating system and the role C played in making that.

#c #type-system

Chapter 0 : Before Getting into Type System

The Problem C Was Solving

  1. Dennis Ritchie is rewriting Unix in C.

The machine underneath is the PDP-11. It has:

  • 16-bit words
  • Byte addressability
  • A handful of registers
  • Very little RAM by any modern standard

The question Ritchie faced wasn’t abstract. It was brutally practical:

How do I describe data to the compiler in a way that maps honestly to what the hardware actually has?

Not “what’s mathematically elegant.” Not “what’s safe.” What maps to the metal.

Every type in C is the answer to that question.

The Hardware Has No Types

The CPU doesn’t know what an int is.

The CPU has registers of a fixed width. It has instructions that operate on 8-bit, 16-bit, 32-bit, 64-bit chunks of memory. It has no concept of “integer” vs “character” vs “pointer.” It moves bits. It adds bits. It compares bits.

The type system is entirely a compiler fiction — a set of promises you make to the compiler about how to interpret a region of bytes so it can emit the right instructions.

When you write:

int x = 42;

You’re not creating an “integer” in any mathematical sense. You’re telling the compiler:

Reserve some bytes. Interpret them as a signed integer of whatever width int is on this platform. Emit sign-extending load instructions when I read it. Emit arithmetic that handles signed overflow the way this platform does.

The compiler listens. The CPU just sees bytes.

The Original Sin — Sizes Are Not Fixed

Here’s where C made a choice that still causes pain today.

Ritchie didn’t fix the sizes of types to specific bit widths. He defined them relative to the platform:

  • char — the natural character size of the machine. At least 8 bits.
  • short — at least 16 bits. At least as big as char.
  • int — the natural word size of the machine. At least 16 bits.
  • long — at least 32 bits. At least as big as int.

“Natural word size” meant: whatever the CPU is most comfortable with. On the PDP-11, that was 16 bits. On a 32-bit machine, 32 bits. The idea was that int should be the fastest integer type on any given platform.

This was pragmatic in 1972. It became a nightmare when code had to run on multiple platforms.

On 32-bit Linux: int = 32 bits, long = 32 bits. On 64-bit Linux: int = 32 bits, long = 64 bits. On 64-bit Windows: int = 32 bits, long = 32 bits.

Same source code. Different behavior. Silent. No warning.

This is why stdint.h was invented decades later — int32_t, uint64_t. Exact widths. No ambiguity. No platform surprises.

You will use fixed-width types for anything that matters. You will understand why they exist by understanding this history.

What a Type Actually Is — Three Things

Every type in C specifies exactly three things:

1. Size — how many bytes does this occupy?

2. Alignment — what address boundary must it start on?

3. Interpretation — how are the bits decoded? Signed? Unsigned? Float? Pointer?

unsigned int x = 0xFFFFFFFF;
int          y = 0xFFFFFFFF;

Same 4 bytes. Same bit pattern: 11111111 11111111 11111111 11111111.

x = 4,294,967,295. y = -1.

The bytes are identical. The type is the instruction to the compiler about how to interpret them.

The type exists at compile time. At runtime, there are only bytes.

The Sign Question

Every integer type has a signed and unsigned variant.

Signed: the high bit is the sign bit. Two’s complement representation. Overflow is undefined behavior in C.

Unsigned: no sign bit. All bits are magnitude. Overflow wraps around modulo 2^N. Defined behavior.

Two’s complement means: to negate a number, flip all bits and add 1.

 42 in 8-bit:  00101010
-42 in 8-bit:  11010110  (flip: 11010101, add 1: 11010110)

The genius of two’s complement: addition and subtraction use the same hardware for signed and unsigned. The CPU doesn’t need separate add circuits. The interpretation is entirely in how you read the result.

This is why C separates signed and unsigned at the type level — same bits, same instructions, different meaning.

Floating Point — A Different World

Integer types are exact. int x = 7 is exactly 7.

Floating point types are approximations.

float is 32 bits. double is 64 bits. They follow the IEEE 754 standard — a specific encoding that trades some range for a lot of precision, across a huge span of magnitudes.

But here’s the truth every systems programmer must internalize:

double x = 0.1;

x is not 0.1. It is 0.1000000000000000055511151231257827021181583404541015625.

Because 0.1 cannot be represented exactly in binary floating point. Just like 1/3 can’t be represented exactly in decimal.

This isn’t a bug. It’s a fundamental property of the representation. Every float operation accumulates rounding error. For graphics, physics simulation, signal processing — manageable. For financial calculations — catastrophic. Banks do not use double. They use fixed-point integer arithmetic or arbitrary precision libraries.

Stuxnet’s centrifuge frequency manipulation — operating on PLC setpoints that controlled physical rotation speed — getting floating point wrong there doesn’t produce a wrong answer in a test. It destroys hardware.

The Type System as a Communication Protocol

Here’s the deepest way to think about C’s type system:

Types are not for the CPU. The CPU doesn’t see them.

Types are a communication protocol between you and the compiler.

You say: this region of memory is a uint32_t. The compiler says: I’ll use 32-bit unsigned load/store instructions, I’ll place it at a 4-byte aligned address, I’ll warn you if you try to assign a pointer to it.

You say: this is a char *. The compiler says: I’ll use byte-wide loads, I’ll allow it to alias any other type (char* is special), I’ll do pointer arithmetic in 1-byte increments.

You say: this is a volatile uint32_t *. The compiler says: I will not optimize away any access through this pointer, every read will actually read, every write will actually write.

When you understand each type as a set of promises and constraints rather than just “a box that holds a number” — you start reading C the way it was meant to be read.

What Part 1 Builds Toward

The 19 topics in Part 1 aren’t a list of facts to memorize.

They’re the answer to one question:

How does C let you describe any data layout that hardware can represent, with enough precision that the compiler can generate exactly the instructions you intend?

By the end of Part 1 you won’t just know what uint32_t is. You’ll know why it exists, what it generates in assembly, what int generates instead, when they differ, and when getting that wrong means your exploit targets the wrong address on a different architecture.

Summary

Types        = size + alignment + interpretation, compile-time only
Hardware     = only bytes and bits, no types at runtime
Sizes        = not fixed in original C, hence stdint.h
Signed       = two's complement, overflow is UB
Unsigned     = wraps modulo 2^N, defined behavior
Float        = approximation, not exact, IEEE 754
Type system  = communication protocol between you and the compiler

We can put all of this in one TLDR sentence:

C’s type system is not a safety mechanism — it’s a precise vocabulary for telling the compiler how to interpret bytes and which machine instructions to emit, and understanding it at that level is the difference between writing C and commanding the machine.