Assembly Basics#
Mastery Level: Understand assembly, analyze gadgets, step through debugging to understand register states
Reference Article: x86_64 Assembly Part 1: AT&T Assembly Syntax_x86_64 Assembly at&t-CSDN Blog
Registers#
The commonly used x86 CPU
registers are 8: EAX
, EBX
, ECX
, EDX
, EDI
, ESI
, EBP
, ESP
The CPU prioritizes reading and writing registers, then exchanges data through registers, caches, and memory to achieve buffering. Accessing registers by name is the fastest method, hence they are also referred to as zero-level cache.
Access speeds from high to low are: registers > Level 1 cache > Level 2 cache > Level 3 cache > memory > hard disk
General Registers and Their Uses#
The 8 registers mentioned above have specific purposes. Taking the 32-bit CPU
as an example, here is a brief description of these registers:
Register | Meaning | Purpose | Contains Registers |
---|---|---|---|
EAX | Accumulator Register | Commonly used for multiplication, division, and function return values | AX(AH, AL) |
EBX | Base Register | Often used as a pointer for memory data, or as a base for accessing memory | BX(BH, BL) |
ECX | Counter Register | Commonly used as a counter in string and loop operations | CX(CH, CL) |
EDX | Data Register | Commonly used for multiplication, division, and I/O pointers | DX(DH, DL) |
ESI | Source Index Register | Commonly used as a pointer for memory data and source strings | SI |
EDI | Destination Index Register | Commonly used as a pointer for memory data and destination strings | DI |
ESP | Stack Pointer Register | Only serves as the top pointer of the stack; cannot be used for arithmetic operations or data transfer | SP |
EBP | Base Pointer Register | Only serves as a stack pointer, can access any address in the stack, often used to transfer data in ESP, also commonly used as a base for accessing the stack; cannot be used for arithmetic operations or data transfer | BP |
In the above table, each commonly used register has other names; rax, eax, ax, ah, al actually represent the same register, just with different scopes.
Below is the correspondence for 64-bit registers:
|63..32|31..16|15-8|7-0|
|AH. |AL.|
|AX......|
|EAX............|
|RAX...................|
Instruction Pointer Register#
The instruction pointer register (RIP
) contains the logical address of the next instruction to be executed.
Typically, after fetching an instruction, RIP increments to point to the next instruction. In x86_64, RIP increments by an offset of 8 bytes.
However, RIP does not always increment; there are exceptions, such as the call
and ret
instructions. The call
instruction pushes the current RIP content onto the stack and transfers control to the target function; the ret
instruction pops the previously pushed 8-byte RIP address from the stack back into RIP.
Flags Register (EFLAGS)#
Assembly Language Instructions#
Common assembly instructions: mov
, je
, jmp
, call
, add
, sub
, inc
, dec
, and
, or
Data Transfer Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
MOV | Transfer | MOV dest, src | Move data from src to dest |
PUSH | Push onto Stack | PUSH src | Push source operand src onto stack |
POP | Pop from Stack | POP dest | Pop data from the top of the stack to dest |
Arithmetic Operation Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
ADD | Addition | ADD dest, src | Add src to dest |
SUB | Subtraction | SUB dest, src | Subtract src from dest |
INC | Increment | INC dest | Increment dest by 1 |
DEC | Decrement | DEC dest | Decrement dest by 1 |
Logical Operation Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
NOT | Negation | NOT dest | Bitwise negation of operand dest |
AND | AND Operation | AND dest, src | Perform AND operation on dest and src, store in dest |
OR | OR Operation | OR dest, src | Perform OR operation on dest and src, store in dest |
XOR | XOR Operation | XOR dest, src | Perform XOR operation on dest and src, store in dest |
Loop Control Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
LOOP | Counting Loop | LOOP label | Decrement ECX by 1, jump to label if ECX is not 0, otherwise execute the statement after LOOP |
Transfer Instructions#
Instruction | Name | Example | Remarks |
---|---|---|---|
JMP | Unconditional Transfer | JMP label | Unconditionally transfer to the location labeled as label |
CALL | Procedure Call | CALL label | Directly call label |
JE | Conditional Transfer | JE label | Jump to the location labeled as label if zf = 1 |
JNE | Conditional Transfer | JNE label | Jump to the location labeled as label if zf = 0 |
Differences in Assembly between Linux and Windows#
The assembly syntax in linux
and windows
is different. The differences in syntax are not absolutely related to the systems; generally, gcc/g++
compilers are used on linux
, while Microsoft's cl
, or MSBUILD
, is used on windows
. Therefore, the different code is due to the different compilers; gcc
uses AT&T assembly syntax format, while MSBUILD
uses Intel assembly syntax format.
Difference | Intel | AT&T |
---|---|---|
Register Name Reference | eax | %eax |
Operand Assignment Order | mov dest, src | movl src, dest |
Prefix for Register and Immediate Instructions | mov ebx, 0xd00d | movl $0xd00d, %ebx |
Register Indirect Addressing | [eax] | (%eax) |
Data Type Size | Suffix letters added to opcode, “l” for 32-bit, “w” for 16-bit, “b” for 8-bit (mov dx, word ptr [eax]) | Prefix with dword ptr, word ptr, byte ptr format (movb %bl %al) |
Addressing Modes#
Direct Addressing
Memory Addressing: [ ]
Overflow (Signed & Unsigned & Up/Down Overflow)#
- Insufficient storage bits
- Overflow into the sign bit
Integer overflow used in conjunction with other vulnerabilities
In my opinion, a carry in signed bits represents overflow.
LINUX File Basics#
Protection Levels: 0-3
0 - Kernel
3 - User
Virtual Memory: The address after physical memory is converted by the MMU. The system allocates a segment of virtual memory space for each user process.
Big Endian and Little Endian#
Big Endian: High-order data -> low-order computer address (more in line with human reading habits)
Little Endian: Low-order data -> low-order computer address (counterintuitive, but more in line with storage logic and operation rules)
Computer outputs strings: from low address to high address
Linux data storage format is Little Endian, while ARM architecture is Big Endian.
When inputting numbers as strings, pay attention to the format; Linux reads data from low to high, and pwntools can be used for conversion.
File Descriptors#
Each file descriptor corresponds to an open file.
- 0: stdin
- 1: stdout
- 2: stderr
stdin->buf->stdout
For example:
read(0, buf, size)
write(1, buf, size)
Stack#
A stripped-down version of an array, can only be operated at one end.
Data structure: Last In, First Out (LIFO), same as function call order
Function execution order: main -> funA -> funB
Function completion order: funB -> funA -> main
Basic operations: push to stack, pop from stack
Function call instruction: call, return instruction: ret
The operating system sets a stack for each program, and each independent function of the program has its own stack frame.
In Linux, the stack grows from high addresses (top of the stack) to low addresses (bottom of the stack).
Many algorithms, such as DFS, utilize the stack and are implemented recursively.
Calling Convention#
What is a Calling Convention#
In the process of calling a function, there are two participants: the caller and the callee.
Calling conventions specify how the caller and callee cooperate to implement function calls, including the following details:
- Where to store function parameters. Are they placed in registers? Or in the stack? Which registers? Which positions in the stack?
- The order of parameter passing. Are parameters pushed onto the stack from left to right, or from right to left?
- How return values are passed back to the caller. Are they placed in registers or elsewhere?
- Etc.
So, why do we need calling conventions?
For example, if we write code in assembly language without a unified standard to follow, then A
might habitually place parameters on the stack, B
might place them in registers, C
might have another habit... Each person writes code according to their own ideas. Thus, when A
tries to call someone else's code, they must follow the other person's habits. For instance, when calling B
, A
needs to place parameters in the registers specified by B
; calling C
would be another case...
Calling conventions exist to solve the above problems. They specify the details of function calls so that everyone adheres to a convention, allowing us to call code written by others without needing to make modifications.
Function Call Stack#
- Function Call: When a function is called, the program allocates a new stack frame for it on the call stack. The stack frame contains the function's parameters, local variables, return address, and other information.
- Parameter Passing: During the function call, parameters are passed to the called function via stack push operations. These parameters are stored in the stack frame for use within the function.
- Function Execution: The called function begins execution, using the parameters and local variables from the stack frame. The execution process of the function may involve complex logic and calculations.
- Return Value Handling: When the function completes, the program returns to the code location that called the function. This location is specified by the return address in the stack frame. If the function has a return value, that value will be pushed into the caller's stack frame.
- Stack Frame Destruction: Once the function call is complete, its corresponding stack frame is popped from the call stack and destroyed, freeing the memory resources it occupied.
Specific Function Call Process#
- pop
The effect of pop rax:
mov rax, [rsp];
// Pop data from the top of the stack into the register
add rsp, 8;
// Move the stack pointer down by one unit
- push
The effect of push rax:
sub rsp, 8;
// Move the stack frame up by one unit
mov [rsp], rax;
// Place a register's value at the top of the stack
- jmp
Immediate jump, does not involve function calls, used for loops, if-else
For example, the effect of call 1234h:
mov rip, 1234h;
- call
Function call, requires saving the return address
For example, the effect of call 1234h:
push rip;
mov rip, 1234h;
- ret
pop rip
Example: main calls funB, funB calls funA, analyze the stack frame changes step by step:
During the function call process:
- Calling Function:
- Push
rip
onto the stack as the return address. (call)
- Push
- Called Function:
- Push
rbp
onto the stack as the base pointer of the current stack frame. - Assign the value of
rsp
torbp
, makingrbp
point to the bottom of the current stack frame. - Allocate stack space for local variables and temporary data, reducing
rsp
by the appropriate size. - Use
rsp
as a base pointer to access function parameters and local variables.
- Push
When the function returns: leave; ret;
- Called Function:
- Pop the allocated local variables and temporary data from the stack.
- Restore
rsp
to its value at the time of the function call.
- Calling Function:
- Pop the return address from the stack.
- Update
rip
with the return address.
Illustration of Stack Frame Changes:
+----------------------------+
| main function stack frame |
+----------------------------+
| Return Address |
| rbp (Base Pointer of main) |
+----------------------------+
| funB's calling function stack frame |
+----------------------------+
| Return Address |
| rbp (Base Pointer of funB) |
+----------------------------+
| funA's called function stack frame |
+----------------------------+
| rbp (Base Pointer of funA) |
| Local Variables |
+----------------------------+
How to Pass Parameters#
Function return value goes to RAX.
The calling convention for x86-64 functions is:
-
Parameters are passed from left to right to
RDI
,RSI
,RDX
,RCX
,R8
,R9
. -
If a function has more than 6 parameters, the additional ones are pushed onto the stack from right to left.
System Calls#
syscall Instruction#
Used to call system functions, specifying the system call number (which can be looked up in the 64-bit Linux system call table).
The system call number is stored in the RAX register, then set up the parameters and execute syscall.
Example: calling read(0, buf, size)
mov rax, 0;
mov rdi, 0;
mov rsi, buf;
mov rdx, size;
syscall;
ELF File Structure#
ELF File Format#
ELF (Executable and Linkable Format) is the binary executable file format in Linux.
ELF Header#
The command readelf -h can read the ELF file header. The ELF header includes the program's entry point (Entry Point Address), segment information, and section information. From the ELF header's Start of program headers and Start of section headers, the locations of the segment table and section table in the file can be located.
Section Header Table#
Use the command readelf -S to read the section information (sections) of the binary ELF file. The program test has a total of 31 sections. Assembly language is written according to sections, such as the .text section and .data section. Assembly code corresponds one-to-one with machine code, and the section information is retained when the assembly program is converted into binary code.
readelf -S test
Program Header Table#
When an ELF program is executed (loaded into memory), the loader creates the process's memory image based on the program's segment table. Use the command readelf -l to read the segment information (segments) of the binary ELF file. The program test has a total of 13 segments, and the number of segments is greater than the number of sections, so multiple sections may map to the same segment.
Based on the permissions of the sections: readable and writable sections are mapped into one segment, read-only sections are mapped into another segment, and so on.
readelf -l test
Linking View/Execution View#
Segment and Section are two different perspectives on the same ELF file. This is referred to as different views in ELF.
From the perspective of Section, the ELF file is the linking view.
From the perspective of Segment, it is the execution view.
When discussing ELF loading, segment specifically refers to Segment; in other cases, segment refers to Section.
libc#
glibc: GNU C Library, glibc itself is the C standard library under GNU, which gradually became the standard C library for Linux.
Its suffix is libc.so, and it is essentially an ELF file that can be executed independently. The dynamic link libraries encountered in pwn challenges are typically libc.so files.
Almost all programs in Linux depend on libc, so the functions in libc are crucial.
Lazy Binding Mechanism#
Static Compilation vs Dynamic Compilation#
Dynamically compiled executable files need to be accompanied by a dynamic link library. During execution, they need to call commands from the corresponding dynamic link library. Thus, its advantages include reducing the size of the executable file itself and speeding up compilation, saving system resources. However, the disadvantages are that even very simple programs that only use one or two commands from the library still need to be accompanied by a relatively large link library; if the corresponding runtime library is not installed on other computers, the dynamically compiled executable file cannot run.
Static compilation means that when the compiler compiles the executable file, it extracts the necessary parts from the corresponding dynamic link library (.so) and links them into the executable file, so that the executable file does not depend on the dynamic link library during runtime. Thus, its advantages and disadvantages complement those of dynamically compiled executable files.
Lazy Binding#
Using lazy binding is based on the premise that in dynamic linking, the modules loaded by the program contain a large number of function calls.
Lazy binding postpones the binding of function addresses until the first call to that function, thus avoiding the dynamic linker from processing a large number of function reference relocations during loading.
The implementation of lazy binding uses two special data structures: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).
Global Offset Table (GOT)#
After the library function is called for the first time, the program saves its address in the GOT table.
The Global Offset Table exists as a separate section in the ELF file, containing two categories, with the corresponding section names being .got
and .got.plt
, where .got
stores the addresses for all external variable references; .got.plt
stores the addresses for all external function references, primarily using the .got.plt
table for lazy binding. The basic structure of the .got.plt
table is shown in the following diagram:
Among them, the first three items in .got.plt
store special address references:
- GOT[0]: Stores the address of the
.dynamic
section, which the dynamic linker uses to extract dynamic linking-related information; - GOT[1]: Stores the ID of this module;
- GOT[2]: Stores the address pointing to the dynamic linker's
_dl_runtime_resolve
function, which is used to resolve the actual symbol addresses of shared library functions.
Procedure Linkage Table (PLT)#
To implement lazy binding, when calling a function from an external module, the program does not jump directly through the GOT but instead jumps through a specific entry stored in the PLT table. For all external functions, there will be a corresponding entry in the PLT table, where each entry contains 16 bytes of code used to call a specific function. The general structure of the procedure linkage table is as follows:
In addition to containing the PLT entries created by the compiler for the external functions called, the PLT also contains a special entry corresponding to PLT[0], which is used to jump to the dynamic linker for actual symbol resolution and relocation work:
PLT and GOT#
Regardless of how many times an external function is called, the program actually calls the PLT table, which is composed of a series of assembly instructions.
So, one might wonder: why does PLT exist, and why not go directly to GOT?
This is like having many relatives; you need to visit them every week, so you write down their addresses in a notebook. When you want to visit, you look it up in the notebook. That notebook is like a PLT table, where each address jumps to the corresponding GOT address (your relatives' homes).
If one day you find it troublesome to run around, you invite all your relatives to live at your house, and now you only need to visit the corresponding room. The notebook becomes useless, and you throw it away. This is when you directly access the GOT without a PLT table.
Which do you think takes up more space, a notebook or a house full of relatives?
This is one reason for the existence of the PLT table: to utilize memory more efficiently.
Another reason is to enhance security.
LINUX Security Protection Mechanisms#
CANARY#
Canary is a protection mechanism against stack overflow attacks. Its basic principle is to copy a random number canary of length 8 bytes, starting with a byte of \x00, from memory at fs: 0x28, which will be pushed onto the stack immediately after rbp when creating the stack frame (right next to the previous position of ebp). When an attacker attempts to overwrite ebp or the return address below ebp through a buffer overflow, they will inevitably overwrite the value of the canary; when the program ends, it checks whether the value of CANARY matches the previous one. If not, the program will not continue, thus preventing buffer overflow attacks.
Bypass Methods:
- Modify the canary.
- Leak the canary.
Canary Bypass#
- Format String Bypass
- Read the value of the canary through format strings.
- Canary Brute Force (for programs with fork function)
- The fork function acts like self-replication; each time a program is copied, the memory layout is the same, and the value of the canary is also the same. Thus, we can brute-force it bit by bit. If the program crashes, it indicates that the bit is incorrect; if the program runs normally, we can continue to the next bit until we find the correct canary.
- Stack Smashing (deliberately triggering canary_ssp leak)
- Hijacking __stack_chk_fail
- Modify the address of the __stack_chk_fail function in the GOT table. After a stack overflow, execute this function, but since its address has been modified, the program will jump to the address we want to execute.
NX#
Data on the stack is non-executable (not executable). Once enabled, writable segments such as the heap, stack, and bss segment in the program cannot be executed.
Bypass Methods:
Use the mprotect function to modify segment permissions, and NX protection does not affect ROP or GOT hijacking exploitation methods.
PIE and ASLR#
What is ASLR?#
ASLR is a feature option of the Linux operating system that applies when a program (ELF) is loaded into memory. It is a security protection technology against buffer overflow, which randomizes the loading address to prevent attackers from directly locating the attack code position, thus preventing overflow attacks.
Enabling and Disabling ASLR#
Check the current ASLR status of the system:
sudo cat /proc/sys/kernel/randomize_va_space
ASLR has three security levels:
- 0: ASLR is off
- 1: Randomizes the stack base address (stack), shared libraries (.so libraries), mmap base address
- 2: On top of 1, adds randomization of the heap base address (chunk)
What is PIE?#
PIE is a feature option of the gcc compiler that applies during the compilation of a program (ELF). It is a protection technology against fixed addresses for code segments (.text), data segments (.data), and uninitialized global variable segments (.bss). If a program has PIE protection enabled, the loading address changes each time the program is loaded, making it impossible to use tools like ROPgadget to assist in solving.
Enabling PIE#
Add the parameter -fPIE
when compiling with gcc.
When PIE is enabled, it randomizes the loading addresses of the code segment (.text), initialized data segment (.data), and uninitialized data segment (.bss).
PIE Bypass#
The loading address of a program is generally in memory page units, so the last three digits of the program's base address must be 0. This means that the last three digits of those known addresses are the last three digits of the actual address. Knowing this, we have a way to bypass PIE; although we do not know the complete address, we know the last three digits, so we can use the addresses already on the stack and only modify the last two bytes (the last four digits).
Thus, the core idea of bypassing PIE is partial writing (partial address writing).
RELRO#
ReLocation Read-Only is a technique used to enhance the protection of binary data segments.