What C Is
The One-Line Definition
C is an imperative procedural language, supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support. Wikipedia
Every word in that sentence carries weight. Let’s dissect each one.
COMPILED
Most beginners hear “compiled” and move on. Don’t.
A compiled language is one where source code is directly translated into machine code that can be executed by a processor. Upwork
The word directly matters. When you write Python or JavaScript, there is a runtime — a program that runs while your code runs, interpreting it line by line, managing memory for you, catching errors for you. That runtime costs time and resources. It sits between your code and the CPU at all times.
C has no such middleman at runtime. You write C. You compile it. The compiler produces a binary — raw machine instructions. When you run that binary, the CPU executes it directly. There is nothing between your code and the hardware except the operating system’s process loader.
A compiled language like C translates the entire source code into machine code before execution. Wikitechy
This is why C is fast. Not because of clever tricks — because it is fundamentally the closest thing to machine code that still reads like a human language.
The Four Stages of Compilation
When you type gcc hello.c -o hello, you are not pressing one button. You are triggering four distinct processes in sequence:
A C program’s building process involves four stages and utilizes different tools such as a preprocessor, compiler, assembler, and linker. Preprocessing is the first pass of any C compilation — it processes include-files, conditional compilation instructions and macros. Compilation is the second pass — it takes the output of the preprocessor and generates assembler source code. Assembly is the third stage — it takes the assembly source code and produces an assembly listing with offsets stored in an object file. Linking is the final stage — it takes one or more object files or libraries as input and combines them to produce a single executable file. HackerEarth
Let’s go deeper on each:
Stage 1 — Preprocessing (.c → .i)
The preprocessor runs before any real compilation. It handles every line starting with #. It does four things:
- Strips all comments — the machine will never see them
- Expands
#include— literally copy-pastes the contents of header files into your source - Expands
#definemacros — textual substitution before compilation - Handles
#ifdef/#ifndef— decides which code even gets compiled
The result is a .i file — pure expanded C with no directives, no comments. Just raw source.
Stage 2 — Compilation (.i → .s)
In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human-readable language. The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used. Calleluks
This is where your C gets converted to Assembly. The compiler reads your code, checks types, validates syntax, and emits assembly instructions for your CPU architecture. If you are compiling for x86-64, you get x86-64 assembly. If you are cross-compiling for ARM (like a Siemens PLC’s underlying CPU), you get ARM assembly.
The compiler generates assembly code. The content is an intermediate language that can be read by humans. TutorialsPoint
You can stop here with gcc -S hello.c and read the .s file. You’ll see the actual CPU instructions your C code became — push, mov, call, ret. This is the last moment a human can still read it.
Stage 3 — Assembly (.s → .o)
The filename.s is taken as input and turned into filename.o by the assembler. This file contains machine-level instructions. At this phase, only existing code is converted into machine language, and the function calls like printf() are not resolved. GeeksforGeeks
The .o file is binary. Pure 1s and 0s. CPU instructions in their final form. But it is not yet runnable — it has holes. Every function call to something defined elsewhere — like printf — is an unresolved reference, a placeholder.
Stage 4 — Linking (.o → executable)
The object code generated in the assembly stage is composed of machine instructions that the processor understands but some pieces of the program are out of order or missing. To produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. The linker will arrange the pieces of object code so that functions in some pieces can successfully call functions in other ones. It will also add pieces containing the instructions for library functions used by the program. Calleluks
The linker is what knows where printf actually lives — inside the C standard library (libc). It stitches your object file and the library object code together into a single runnable binary.
This is also why Stuxnet’s DLL replacement attack worked. s7otbxdx.dll was a linked library — a pre-compiled object that WinCC called functions from. Stuxnet replaced the binary on disk with its own version that had the same exported function names but different behavior underneath. The linker had already done its work — WinCC trusted the binary that was there.
STATICALLY TYPED
C programming language is a statically typed language — meaning the type of variable is checked at the time of compilation, not at runtime. This means each time a programmer types a program they have to mention the type of variables used. GeeksforGeeks
When you write int x = 5; — the compiler locks in the fact that x is an integer at compile time. It will never be anything else. The CPU doesn’t know about types — it only knows about bytes. Static typing is the compiler enforcing rules before the program ever runs, so the generated machine code can be precise and fast.
Compare this to Python: x = 5 then later x = "hello" — perfectly legal. Python figures out what x is at runtime, every time you use it. That costs time. Every single access. C pays that cost zero times at runtime because it was resolved at compile time.
This is important for Stuxnet analysis: when you reverse-engineer compiled C code, you are reading machine code that has no type information left in it. The compiler stripped it. Reconstructing what a variable was — integer, pointer, struct — is a core challenge of reverse engineering.
MANUAL MEMORY
This is the sharpest edge in C. No garbage collector. No automatic cleanup. You are the memory manager.
C has three different pools of memory: static — global variable storage, permanent for the entire run of the program; stack — local variable storage, automatic and continuous memory; heap — dynamic storage, a large pool of memory not allocated in contiguous order. The Craft of Coding
The Stack
The stack is a special region of memory, automatically managed by the CPU — so you don’t have to allocate or deallocate memory. Stack memory is divided into successive frames where each time a function is called, it allocates itself a fresh stack frame. When a function finishes running, all the variables associated with that function on the stack are deleted and the memory they use is freed up. This leads to the local scope of function variables. The Craft of Coding
The stack is automatic. Declare a local variable inside a function — it lives on the stack. The function returns — the stack frame collapses and that memory is gone. You don’t call any function to free it. The CPU’s stack pointer register just moves.
The Heap
Unlike the stack, the heap allows for dynamic memory allocation at runtime. The heap is a region of memory that is not automatically managed and requires explicit code to allocate and free. Medium
When you call malloc(100), you are asking the OS for 100 bytes from the heap. That memory persists until you explicitly call free() on it — not when your function returns, not when your variable goes out of scope. It stays allocated until you say otherwise.
If you forget to call free() — that memory is gone for the life of the process. That is a memory leak. If you call free() twice on the same address — undefined behavior. If you write beyond what you allocated — a buffer overflow. This is the direct path to exploitable security vulnerabilities. The exact class of vulnerability Stuxnet used to escape from userspace into kernel space on Windows.
Memory in a Running Program
Memory is divided into sections: the text segment stores the executable code of the program — it is read-only. The data segment stores global and static variables. The stack stores local variables, function parameters, and return addresses for each function call. The heap is used for dynamic memory allocation and is shared by all shared libraries and dynamically loaded modules in a process. GeeksforGeeks
So when your C program runs, memory looks like this:
HIGH ADDRESSES
┌─────────────────┐
│ STACK │ ← grows downward
│ (local vars, │ function frames,
│ return addrs) │ managed by CPU
├─────────────────┤
│ ↓ │
│ (gap) │
│ ↑ │
├─────────────────┤
│ HEAP │ ← grows upward
│ (malloc/free) │ you manage this
├─────────────────┤
│ DATA / BSS │ ← global/static vars
├─────────────────┤
│ TEXT │ ← your compiled code
│ (read-only) │ actual machine instructions
LOW ADDRESSES
Stuxnet modified memory in every one of these regions. It injected code into the text region of the DLL. It stored its configuration data in the data region. It used the heap to dynamically build and store intercepted PLC messages. Understanding this diagram is not optional for analyzing Stuxnet. It is mandatory.
CLOSE TO METAL
This phrase means: almost nothing happens in C that you didn’t explicitly ask for.
C was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support. Wikipedia
In Python: x = [] allocates a list object, a reference counter, internal bookkeeping structures, and reserves space for future elements. You wrote one character. The runtime did many things you didn’t ask for.
In C: int arr[10]; allocates exactly 40 bytes on the stack (assuming 4-byte integers). Ten integers. Contiguous. Nothing else. No bookkeeping. No metadata. No bounds checking. If you access arr[15] — C doesn’t stop you. You read whatever bytes happen to live at that memory address. You might read another variable. You might read a return address. You might crash the program. You might corrupt data silently. C trusts you completely.
This is what close to metal means. The language does not protect you from yourself. In exchange, it gives you total control and zero overhead.
That total control is exactly what Stuxnet’s authors needed. To replace a DLL, intercept system calls, inject PLC bytecode, and fake sensor readings — you need to be able to reach into raw memory and manipulate bytes directly. Python cannot do this. Java cannot do this. Only languages that are close to metal — C, C++, Assembly — give you that access.
What C IS NOT — and Why That Matters
C lacks exception handling — the ability to handle exceptions such as bugs and anomalies that can happen during source code execution. Since C isn’t object-oriented, it doesn’t offer constructor and destructor features. C isn’t equipped with garbage collection — this feature automatically reclaims memory from objects that are no longer required. TechTarget
No OOP. No GC. No exceptions. No bounds checking. No type inference. No closures. No built-in string type.
This is not a list of weaknesses. It is a list of choices. Every feature that was left out is a feature that would cost runtime overhead, or abstract away hardware access, or make the language harder to compile to efficient machine code.
C is the minimum viable language for talking to a computer. And minimum viable is exactly what you need when you are writing code that will run on a stripped-down industrial PLC with 256KB of memory and no operating system.
The Connection — 0.1 to Stuxnet
Every property of C you just read is directly visible in Stuxnet’s architecture:
COMPILED → Stuxnet was a compiled binary. Reverse engineers
had to reconstruct its logic from machine code.
STATICALLY TYPED → No type info survives in the binary. Analysts
had to infer struct layouts from memory access patterns.
MANUAL MEMORY → The DLL replacement exploited the heap. Buffer
overflows in Windows kernel escalated privileges.
CLOSE TO METAL → Raw memory writes to the PLC's OB1 block.
Direct byte manipulation of Profibus packets.
No abstraction. Pure bytes.
This is what C is. Not a language for building websites. A language for talking directly to machines, at the level machines understand. That is why it was chosen — by Ritchie to write Unix, by embedded engineers to write firmware, and by the architects of the first cyberweapon to attack physical infrastructure.
Was stuxnet only written in C?
No. And that is what makes it extraordinary.
Stuxnet is unusually large at half a megabyte in size, and written in several different programming languages including C and C++, which is also irregular for malware. Wikipedia
Here is the complete language breakdown, layer by layer:
Every Language Stuxnet Was Written In
1. C — The Dropper and Core Logic
Reverse engineering analysis indicates that the Stuxnet dropper and payload were most likely coded in C. The payload inserts itself between the PC used to monitor the Natanz centrifuge array and the target centrifuge array — a classic man-in-the-middle attack. CopyProgramming
The dropper is the part that spreads via USB, exploits Windows, installs itself, and persists. Raw C. Direct memory access, pointer manipulation, file I/O. Everything you are learning right now.
2. C++ — The Windows-Side Architecture
Stuxnet was coded in a few different languages, with significant portions in C++ and C. One security researcher, Tom Parker, noted that the injector and the payload appeared to have notably different coding styles. He posited the idea that these two elements were designed in two different countries. Wikidot
Compared to Stuxnet, which was entirely written in MSVC++, Duqu’s framework was written in an unknown programming language. Network World
The object-oriented Windows layer — the rootkit driver, the DLL replacement, the peer-to-peer update system — was built in C++ compiled with Microsoft Visual C++. Classes, vtables, structured component separation. This is why it was so modular and maintainable across multiple teams.
3. MC7 Bytecode — The PLC Payload
This is the one nobody outside industrial control systems had ever heard of before Stuxnet.
The tasks are distinct because, for instance, the hiding of infected code blocks takes place on the infected Windows machine using standard C/C++ code, whereas the malicious code that Stuxnet aims to run on the industrial control system executes on the PLC and is written in MC7 bytecode. MC7 is the assembly language that runs on PLCs and is often originally written in STL. Scribd
MC7 is Siemens’ proprietary bytecode — the machine code of the S7-300 PLC. It is what the PLC’s internal processor actually executes. Nobody at the time had publicly documented it. Stuxnet’s authors had to understand it at a binary level to inject code into OB1 and OB35.
In the case of the Siemens S7 PLCs, there are numerous programming languages available to configure the controller, including Statement List (STL), Ladder Diagram (LD), Function Block Diagram (FBD), and Structured Control Language (SCL). Regardless of input sources, the PLC program will be compiled into MC7/MC7+ bytecode, which is a lower-level representation of the code. It is not possible to decode MC7/MC7+ bytecode without reverse engineering, because Siemens has not publicly provided documentation to do so. Claroty
The authors didn’t just know C and C++. They knew an undocumented, proprietary embedded bytecode language for a specific industrial PLC that was never meant to be touched by anyone outside Siemens.
4. STL (Statement List) — The PLC Assembly
Siemens Simatic PLC can be programmed using a variety of languages, including ladder logic, low-level STL (Statement List, assembly-like) and higher-level SCL (Structured Control Language, Pascal-like). Regardless of the source type, the program is compiled to MC7 bytecode. Pnfsoftware
STL is effectively PLC assembly. Every STL instruction maps directly to one or a few MC7 bytecode instructions. The Stuxnet payload was originally written in STL then compiled to MC7 before being embedded inside the Windows-side C/C++ binary as a raw data blob.
The Full Map
LANGUAGE LAYER PURPOSE
─────────────────────────────────────────────────────────
C Windows dropper Spread via USB, exploit
Windows zero-days, file I/O
C++ (MSVC++) Windows architecture Rootkit driver, DLL
replacement, P2P update
system, modular components
STL PLC source Assembly-like language to
write the centrifuge attack
logic (human-authored)
MC7 bytecode PLC payload Compiled from STL, injected
raw into OB1 and OB35 blocks
on the Siemens S7-300
Why This Matters for CR9 Cobra
A Symantec strategist estimates that 30 different people had a hand in the coding of Stuxnet. Wikidot
Thirty people. Multiple teams. Multiple countries. Multiple languages. Each team owned a layer.
The C team knew Windows internals and file manipulation. The C++ team knew object-oriented architecture and Windows driver development. The STL/MC7 team knew Siemens PLCs at the binary level — probably former Siemens engineers or people with direct access to the hardware.
No one person knew everything. But to analyze Stuxnet — which is what CR9 Cobra is building toward — you need to understand all of it. C and C++ first, because that is 90% of the binary. Then STL and MC7 when you get to the PLC payload, which is the deepest and most dangerous part.