PWN Notes on Basics

Assembly Basics#

Mastery Level: Understand assembly, analyze gadgets, step through debugging to understand register states

Reference Article: x86_64 Assembly Part 1: AT&T Assembly Syntax_x86_64 Assembly at&t-CSDN Blog

Registers#

The commonly used x86 CPU registers are 8: EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP

The CPU prioritizes reading and writing registers, then exchanges data through registers, caches, and memory to achieve buffering. Accessing registers by name is the fastest method, hence they are also referred to as zero-level cache.

Access speeds from high to low are: registers > Level 1 cache > Level 2 cache > Level 3 cache > memory > hard disk

General Registers and Their Uses#

The 8 registers mentioned above have specific purposes. Taking the 32-bit CPU as an example, here is a brief description of these registers:

Register	Meaning	Purpose	Contains Registers
EAX	Accumulator Register	Commonly used for multiplication, division, and function return values	AX(AH, AL)
EBX	Base Register	Often used as a pointer for memory data, or as a base for accessing memory	BX(BH, BL)
ECX	Counter Register	Commonly used as a counter in string and loop operations	CX(CH, CL)
EDX	Data Register	Commonly used for multiplication, division, and I/O pointers	DX(DH, DL)
ESI	Source Index Register	Commonly used as a pointer for memory data and source strings	SI
EDI	Destination Index Register	Commonly used as a pointer for memory data and destination strings	DI
ESP	Stack Pointer Register	Only serves as the top pointer of the stack; cannot be used for arithmetic operations or data transfer	SP
EBP	Base Pointer Register	Only serves as a stack pointer, can access any address in the stack, often used to transfer data in ESP, also commonly used as a base for accessing the stack; cannot be used for arithmetic operations or data transfer	BP

In the above table, each commonly used register has other names; rax, eax, ax, ah, al actually represent the same register, just with different scopes.

Below is the correspondence for 64-bit registers:

|63..32|31..16|15-8|7-0|
              |AH. |AL.|
              |AX......|
       |EAX............|
|RAX...................|

Instruction Pointer Register#

The instruction pointer register (RIP) contains the logical address of the next instruction to be executed.

Typically, after fetching an instruction, RIP increments to point to the next instruction. In x86_64, RIP increments by an offset of 8 bytes.

However, RIP does not always increment; there are exceptions, such as the call and ret instructions. The call instruction pushes the current RIP content onto the stack and transfers control to the target function; the ret instruction pops the previously pushed 8-byte RIP address from the stack back into RIP.

Flags Register (EFLAGS)#

Assembly Language Instructions#

Common assembly instructions: mov, je, jmp, call, add, sub, inc, dec, and, or

Data Transfer Instructions#

Instruction	Name	Example	Remarks
MOV	Transfer	MOV dest, src	Move data from src to dest
PUSH	Push onto Stack	PUSH src	Push source operand src onto stack
POP	Pop from Stack	POP dest	Pop data from the top of the stack to dest

Arithmetic Operation Instructions#

Instruction	Name	Example	Remarks
ADD	Addition	ADD dest, src	Add src to dest
SUB	Subtraction	SUB dest, src	Subtract src from dest
INC	Increment	INC dest	Increment dest by 1
DEC	Decrement	DEC dest	Decrement dest by 1

Logical Operation Instructions#

Instruction	Name	Example	Remarks
NOT	Negation	NOT dest	Bitwise negation of operand dest
AND	AND Operation	AND dest, src	Perform AND operation on dest and src, store in dest
OR	OR Operation	OR dest, src	Perform OR operation on dest and src, store in dest
XOR	XOR Operation	XOR dest, src	Perform XOR operation on dest and src, store in dest

Loop Control Instructions#

Instruction	Name	Example	Remarks
LOOP	Counting Loop	LOOP label	Decrement ECX by 1, jump to label if ECX is not 0, otherwise execute the statement after LOOP

Transfer Instructions#

Instruction	Name	Example	Remarks
JMP	Unconditional Transfer	JMP label	Unconditionally transfer to the location labeled as label
CALL	Procedure Call	CALL label	Directly call label
JE	Conditional Transfer	JE label	Jump to the location labeled as label if zf = 1
JNE	Conditional Transfer	JNE label	Jump to the location labeled as label if zf = 0

Differences in Assembly between Linux and Windows#

The assembly syntax in linux and windows is different. The differences in syntax are not absolutely related to the systems; generally, gcc/g++ compilers are used on linux, while Microsoft's cl, or MSBUILD, is used on windows. Therefore, the different code is due to the different compilers; gcc uses AT&T assembly syntax format, while MSBUILD uses Intel assembly syntax format.

Difference	Intel	AT&T
Register Name Reference	eax	%eax
Operand Assignment Order	mov dest, src	movl src, dest
Prefix for Register and Immediate Instructions	mov ebx, 0xd00d	movl $0xd00d, %ebx
Register Indirect Addressing	[eax]	(%eax)
Data Type Size	Suffix letters added to opcode, “l” for 32-bit, “w” for 16-bit, “b” for 8-bit (mov dx, word ptr [eax])	Prefix with dword ptr, word ptr, byte ptr format (movb %bl %al)

Addressing Modes#

Direct Addressing

Memory Addressing: [ ]

Overflow (Signed & Unsigned & Up/Down Overflow)#

Insufficient storage bits
Overflow into the sign bit

Integer overflow used in conjunction with other vulnerabilities

In my opinion, a carry in signed bits represents overflow.

LINUX File Basics#

Protection Levels: 0-3

0 - Kernel

3 - User

Virtual Memory: The address after physical memory is converted by the MMU. The system allocates a segment of virtual memory space for each user process.

Big Endian and Little Endian#

Big Endian: High-order data -> low-order computer address (more in line with human reading habits)

Little Endian: Low-order data -> low-order computer address (counterintuitive, but more in line with storage logic and operation rules)

Computer outputs strings: from low address to high address

Linux data storage format is Little Endian, while ARM architecture is Big Endian.

When inputting numbers as strings, pay attention to the format; Linux reads data from low to high, and pwntools can be used for conversion.

File Descriptors#

Each file descriptor corresponds to an open file.

0: stdin
1: stdout
2: stderr

stdin->buf->stdout

For example:

read(0, buf, size)

write(1, buf, size)

Stack#

A stripped-down version of an array, can only be operated at one end.

Data structure: Last In, First Out (LIFO), same as function call order

Function execution order: main -> funA -> funB

Function completion order: funB -> funA -> main

Basic operations: push to stack, pop from stack

Function call instruction: call, return instruction: ret

The operating system sets a stack for each program, and each independent function of the program has its own stack frame.

In Linux, the stack grows from high addresses (top of the stack) to low addresses (bottom of the stack).

Many algorithms, such as DFS, utilize the stack and are implemented recursively.

Calling Convention#

What is a Calling Convention#

In the process of calling a function, there are two participants: the caller and the callee.

Calling conventions specify how the caller and callee cooperate to implement function calls, including the following details:

Where to store function parameters. Are they placed in registers? Or in the stack? Which registers? Which positions in the stack?
The order of parameter passing. Are parameters pushed onto the stack from left to right, or from right to left?
How return values are passed back to the caller. Are they placed in registers or elsewhere?
Etc.

So, why do we need calling conventions?

For example, if we write code in assembly language without a unified standard to follow, then A might habitually place parameters on the stack, B might place them in registers, C might have another habit... Each person writes code according to their own ideas. Thus, when A tries to call someone else's code, they must follow the other person's habits. For instance, when calling B, A needs to place parameters in the registers specified by B; calling C would be another case...

Calling conventions exist to solve the above problems. They specify the details of function calls so that everyone adheres to a convention, allowing us to call code written by others without needing to make modifications.

Function Call Stack#

Function Call: When a function is called, the program allocates a new stack frame for it on the call stack. The stack frame contains the function's parameters, local variables, return address, and other information.
Parameter Passing: During the function call, parameters are passed to the called function via stack push operations. These parameters are stored in the stack frame for use within the function.
Function Execution: The called function begins execution, using the parameters and local variables from the stack frame. The execution process of the function may involve complex logic and calculations.
Return Value Handling: When the function completes, the program returns to the code location that called the function. This location is specified by the return address in the stack frame. If the function has a return value, that value will be pushed into the caller's stack frame.
Stack Frame Destruction: Once the function call is complete, its corresponding stack frame is popped from the call stack and destroyed, freeing the memory resources it occupied.

Specific Function Call Process#

The effect of pop rax:

mov rax, [rsp]; // Pop data from the top of the stack into the register

add rsp, 8; // Move the stack pointer down by one unit

push

The effect of push rax:

sub rsp, 8; // Move the stack frame up by one unit

mov [rsp], rax; // Place a register's value at the top of the stack

Immediate jump, does not involve function calls, used for loops, if-else

For example, the effect of call 1234h:

mov rip, 1234h；

call

Function call, requires saving the return address

For example, the effect of call 1234h:

push rip;

mov rip, 1234h；

pop rip

Example: main calls funB, funB calls funA, analyze the stack frame changes step by step:

During the function call process:

Calling Function:
- Push rip onto the stack as the return address. (call)
Called Function:
- Push rbp onto the stack as the base pointer of the current stack frame.
- Assign the value of rsp to rbp, making rbp point to the bottom of the current stack frame.
- Allocate stack space for local variables and temporary data, reducing rsp by the appropriate size.
- Use rsp as a base pointer to access function parameters and local variables.

When the function returns: leave; ret;

Called Function:
- Pop the allocated local variables and temporary data from the stack.
- Restore rsp to its value at the time of the function call.
Calling Function:
- Pop the return address from the stack.
- Update rip with the return address.

Illustration of Stack Frame Changes:

+----------------------------+
| main function stack frame   |
+----------------------------+
| Return Address              |
| rbp (Base Pointer of main)  |
+----------------------------+
| funB's calling function stack frame |
+----------------------------+
| Return Address              |
| rbp (Base Pointer of funB)  |
+----------------------------+
| funA's called function stack frame |
+----------------------------+
| rbp (Base Pointer of funA)  |
| Local Variables             |
+----------------------------+

How to Pass Parameters#

Function return value goes to RAX.

The calling convention for x86-64 functions is:

Parameters are passed from left to right to RDI, RSI, RDX, RCX, R8, R9.
If a function has more than 6 parameters, the additional ones are pushed onto the stack from right to left.

System Calls#

syscall Instruction#

Used to call system functions, specifying the system call number (which can be looked up in the 64-bit Linux system call table).

The system call number is stored in the RAX register, then set up the parameters and execute syscall.

Example: calling read(0, buf, size)

mov rax, 0;
mov rdi, 0;
mov rsi, buf;
mov rdx, size;
syscall;

ELF File Structure#

ELF File Format#

ELF (Executable and Linkable Format) is the binary executable file format in Linux.

ELF Header#

The command readelf -h can read the ELF file header. The ELF header includes the program's entry point (Entry Point Address), segment information, and section information. From the ELF header's Start of program headers and Start of section headers, the locations of the segment table and section table in the file can be located.

Section Header Table#

Use the command readelf -S to read the section information (sections) of the binary ELF file. The program test has a total of 31 sections. Assembly language is written according to sections, such as the .text section and .data section. Assembly code corresponds one-to-one with machine code, and the section information is retained when the assembly program is converted into binary code.

readelf -S test

Program Header Table#

When an ELF program is executed (loaded into memory), the loader creates the process's memory image based on the program's segment table. Use the command readelf -l to read the segment information (segments) of the binary ELF file. The program test has a total of 13 segments, and the number of segments is greater than the number of sections, so multiple sections may map to the same segment.

Based on the permissions of the sections: readable and writable sections are mapped into one segment, read-only sections are mapped into another segment, and so on.

readelf -l test

Linking View/Execution View#

Segment and Section are two different perspectives on the same ELF file. This is referred to as different views in ELF.

From the perspective of Section, the ELF file is the linking view.
From the perspective of Segment, it is the execution view.
When discussing ELF loading, segment specifically refers to Segment; in other cases, segment refers to Section.

libc#

glibc: GNU C Library, glibc itself is the C standard library under GNU, which gradually became the standard C library for Linux.

Its suffix is libc.so, and it is essentially an ELF file that can be executed independently. The dynamic link libraries encountered in pwn challenges are typically libc.so files.

Almost all programs in Linux depend on libc, so the functions in libc are crucial.

Lazy Binding Mechanism#

Static Compilation vs Dynamic Compilation#

Dynamically compiled executable files need to be accompanied by a dynamic link library. During execution, they need to call commands from the corresponding dynamic link library. Thus, its advantages include reducing the size of the executable file itself and speeding up compilation, saving system resources. However, the disadvantages are that even very simple programs that only use one or two commands from the library still need to be accompanied by a relatively large link library; if the corresponding runtime library is not installed on other computers, the dynamically compiled executable file cannot run.

Static compilation means that when the compiler compiles the executable file, it extracts the necessary parts from the corresponding dynamic link library (.so) and links them into the executable file, so that the executable file does not depend on the dynamic link library during runtime. Thus, its advantages and disadvantages complement those of dynamically compiled executable files.

Lazy Binding#

Using lazy binding is based on the premise that in dynamic linking, the modules loaded by the program contain a large number of function calls.

Lazy binding postpones the binding of function addresses until the first call to that function, thus avoiding the dynamic linker from processing a large number of function reference relocations during loading.

The implementation of lazy binding uses two special data structures: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).

Global Offset Table (GOT)#

After the library function is called for the first time, the program saves its address in the GOT table.

The Global Offset Table exists as a separate section in the ELF file, containing two categories, with the corresponding section names being .got and .got.plt, where .got stores the addresses for all external variable references; .got.plt stores the addresses for all external function references, primarily using the .got.plt table for lazy binding. The basic structure of the .got.plt table is shown in the following diagram:

Among them, the first three items in .got.plt store special address references:

GOT[0]: Stores the address of the .dynamic section, which the dynamic linker uses to extract dynamic linking-related information;
GOT[1]: Stores the ID of this module;
GOT[2]: Stores the address pointing to the dynamic linker's _dl_runtime_resolve function, which is used to resolve the actual symbol addresses of shared library functions.

Procedure Linkage Table (PLT)#

To implement lazy binding, when calling a function from an external module, the program does not jump directly through the GOT but instead jumps through a specific entry stored in the PLT table. For all external functions, there will be a corresponding entry in the PLT table, where each entry contains 16 bytes of code used to call a specific function. The general structure of the procedure linkage table is as follows:

In addition to containing the PLT entries created by the compiler for the external functions called, the PLT also contains a special entry corresponding to PLT[0], which is used to jump to the dynamic linker for actual symbol resolution and relocation work:

PLT and GOT#

Regardless of how many times an external function is called, the program actually calls the PLT table, which is composed of a series of assembly instructions.

So, one might wonder: why does PLT exist, and why not go directly to GOT?

This is like having many relatives; you need to visit them every week, so you write down their addresses in a notebook. When you want to visit, you look it up in the notebook. That notebook is like a PLT table, where each address jumps to the corresponding GOT address (your relatives' homes).

If one day you find it troublesome to run around, you invite all your relatives to live at your house, and now you only need to visit the corresponding room. The notebook becomes useless, and you throw it away. This is when you directly access the GOT without a PLT table.

Which do you think takes up more space, a notebook or a house full of relatives?

This is one reason for the existence of the PLT table: to utilize memory more efficiently.

Another reason is to enhance security.

LINUX Security Protection Mechanisms#

Detailed Explanation of GCC Security Compilation Options (NX(DEP), RELRO, PIE(ASLR), CANARY, FORTIFY)_gcc pie-CSDN Blog

CANARY#

Canary is a protection mechanism against stack overflow attacks. Its basic principle is to copy a random number canary of length 8 bytes, starting with a byte of \x00, from memory at fs: 0x28, which will be pushed onto the stack immediately after rbp when creating the stack frame (right next to the previous position of ebp). When an attacker attempts to overwrite ebp or the return address below ebp through a buffer overflow, they will inevitably overwrite the value of the canary; when the program ends, it checks whether the value of CANARY matches the previous one. If not, the program will not continue, thus preventing buffer overflow attacks.

Bypass Methods:

Modify the canary.
Leak the canary.

Canary Bypass#

Format String Bypass
- Read the value of the canary through format strings.
Canary Brute Force (for programs with fork function)
- The fork function acts like self-replication; each time a program is copied, the memory layout is the same, and the value of the canary is also the same. Thus, we can brute-force it bit by bit. If the program crashes, it indicates that the bit is incorrect; if the program runs normally, we can continue to the next bit until we find the correct canary.
Stack Smashing (deliberately triggering canary_ssp leak)
Hijacking __stack_chk_fail
- Modify the address of the __stack_chk_fail function in the GOT table. After a stack overflow, execute this function, but since its address has been modified, the program will jump to the address we want to execute.

NX#

Data on the stack is non-executable (not executable). Once enabled, writable segments such as the heap, stack, and bss segment in the program cannot be executed.

Bypass Methods:

Use the mprotect function to modify segment permissions, and NX protection does not affect ROP or GOT hijacking exploitation methods.

PIE and ASLR#

What is ASLR?#

ASLR is a feature option of the Linux operating system that applies when a program (ELF) is loaded into memory. It is a security protection technology against buffer overflow, which randomizes the loading address to prevent attackers from directly locating the attack code position, thus preventing overflow attacks.

Enabling and Disabling ASLR#

Check the current ASLR status of the system:

sudo cat /proc/sys/kernel/randomize_va_space

ASLR has three security levels:

0: ASLR is off
1: Randomizes the stack base address (stack), shared libraries (.so libraries), mmap base address
2: On top of 1, adds randomization of the heap base address (chunk)

What is PIE?#

PIE is a feature option of the gcc compiler that applies during the compilation of a program (ELF). It is a protection technology against fixed addresses for code segments (.text), data segments (.data), and uninitialized global variable segments (.bss). If a program has PIE protection enabled, the loading address changes each time the program is loaded, making it impossible to use tools like ROPgadget to assist in solving.

Enabling PIE#

Add the parameter -fPIE when compiling with gcc.

When PIE is enabled, it randomizes the loading addresses of the code segment (.text), initialized data segment (.data), and uninitialized data segment (.bss).

PIE Bypass#

The loading address of a program is generally in memory page units, so the last three digits of the program's base address must be 0. This means that the last three digits of those known addresses are the last three digits of the actual address. Knowing this, we have a way to bypass PIE; although we do not know the complete address, we know the last three digits, so we can use the addresses already on the stack and only modify the last two bytes (the last four digits).

Thus, the core idea of bypassing PIE is partial writing (partial address writing).

RELRO#

ReLocation Read-Only is a technique used to enhance the protection of binary data segments.