PWN101.1

PWN101 Part 1: Introduction

  • Intro to Assembly
  • The Stack
  • Linux, ELF
  • Basic Reverse Engineering
PWN101.1 - Assembly

What is PWN / RE

Compiled languages like C, C++, and Rust are converted into machine code that the CPU can execute. There are multiple binary file formats: raw machine code, PE on Windows, ELF on Linux.

PWN101.1 - Assembly

What is PWN / RE

  • Reverse Engineering - examining binaries with diassemblers, decompilers, and debuggers to figure out what they do.
  • Binary Exploitation - Using bugs, often memory corruption, to take control of a running program. Called pwn in CTFs.
PWN101.1 - Assembly

Intro to Assembly

PWN101.1 - Assembly

Architectures

Different CPU archtectures have different instruction sets, registers, register sizes, etc.

  • x86 - common for PCs, laptops
  • ARM - phones, some chromebooks, M1/M2 macbooks
  • Risc-V, PowerPC, 6502,

This workshop will focus on x86 / ia32 / x64 / x86-64 / amd64 (not ia64).

PWN101.1 - Assembly

x86 Assembly

The assembler converts assembly code into machine code.


Flat Assembler (fasm) is an x86(64) assembler using intel-style syntax. Their programmer's manual is a great reference for learning assembly:
https://flatassembler.net/docs.php?article=manual

PWN101.1 - Assembly

x86 Assembly

    mov eax, 100
  infinite_loop:
    jmp infinite_loop
    B8          # mov eax
    64 00 00 00 # 0x00000064 == 100
    EB          # jmp
    FE          # -2
PWN101.1 - Assembly

x86 Assembly

    mov ebx, mydata
    add ebx, 2
    mov eax, [ebx]
    hlt

mydata:
    db 0xAA, 0xBB, 0xEF, 0xBE
    db 0xAD, 0xDE, 0xEF, 0xBE
    db 0xAD, 0xDE, 0xCC, 0xDD
PWN101.1 - Assembly

Endianness

Defines the order in which bytes of a larger number (e.g. 4-byte integer) are stored into memory. For example, the number 0x11223344:

little endian: 44 33 22 11
big endian:    11 22 33 44
PWN101.1 - Assembly

Register Sizes

PWN101.1 - Assembly

Sample 1

main:
    mov ebx, mydata

  .mylabel:
    mov al, [ebx]
    inc ebx
    cmp al, 0
    jne .mylabel

    dec ebx
    sub ebx, mydata
    ; what is the value of ebx here?

mydata:
    db 0x48, 0x65, 0x6c, 0x6c
    db 0x6f, 0x2c, 0x20, 0x57
    db 0x6f, 0x72, 0x6c, 0x64
    db 0x21, 0x00, 0x00, 0x00
PWN101.1 - Assembly

Sample 1

char mydata[] = "Hello, World!";

main() {
    char *ebx = mydata;
    while (true) {
        char al = *ebx;
        ebx++;
        if (al == 0) {
            break;
        }
    }
    ebx--;
    ebx -= mydata;
}
PWN101.1 - Assembly

Sample 2

myfunction:
    ; myfunction assumes ebx and ecx are pointers to data
    mov edx, 1
  .loop:
    mov ah, [ebx]
    mov al, [ecx]
    cmp al, ah
    jne .break
    test al, al
    je .end
    inc ebx
    inc ecx
    jmp .loop
  .break:
    xor edx, edx
  .end:
    ret ; what is the value of edx here?
PWN101.1 - Assembly

Sample 2

bool myfunction(char *ebx, char *ecx) {
    while (true) {
        char ah = *ebx;
        char al = *ecx;
        if (ah != al) {
            return false;
        }
        if (ah == 0) {
            return true;
        }
        ebx++;
        ecx++;
    }
}
PWN101.1 - Assembly

Break

PWN101.1 - Stack

The Stack

PWN101.1 - Stack

Memory Layout

The stack and heap are dynamically-sized sections of memory.

  • Stack - grows with function calls, contains function-local data and control flow information.
  • Heap - contains data persistent between function calls, manually allocated and released.
PWN101.1 - Stack

What's on the Stack?

Function local data, like variables.

int main() {
    // integer (100) is on the stack
    int x = 100;
    // pointer is on the stack, pointing to the string
    // in a data section
    char *asdf = "hello, world";
    // copies the actual string onto the stack
    char buf[13];
    strcpy(buf, asdf);
    return 0;
}
PWN101.1 - Stack

What's on the Stack?

Control flow information

int add(int x, int y) {
    return x + y;
}

int main() {
    return add(100, 200);
}

If add can be called from anywhere, how does it know where to return to? How do we pass 100 and 200 into add?

PWN101.1 - Stack

What's on the Stack?

PWN101.1 - Stack

Stack Registers & Instructions

The stack pointer (rsp) and base pointer (rbp) are special registers that point to the ends of the stack.

PWN101.1 - Stack

Stack Registers & Instructions

While you can directly manipulated rsp and rbp, you can also use push and pop to use the stack.

  • push writes data to rsp then decrements rsp, growing the stack
  • pop reads data from rsp then increments rsp, removing from the stack
PWN101.1 - Stack

Setting up a Stack Frame

Most functions begin with a function prologue, a series of instructions that initialize a stack frame. Along with it's inverse the function epilogue to restore a previous frame.

    push rbp      ; needed to restore the prev frame
    mov  rbp, rsp ; update rbp to a new frame
    sub  rsp, 60  ; reserve space for local vars
    ...
    ; local vars are usually referenced relative to rbp
    ; for example [rbp-8] or [rbp-40]
    ...
    mov  esp, ebp ; pop the whole stack frame
    pop  ebp      ; restore the previous frame

This could be simplified with the x86 instructions enter and leave. In practice, leave is widely used but enter is not.

PWN101.1 - Stack

Calling Functions

  some_function:
    push rbp
    mov  rbp, rsp
    sub  rsp, 32
    mov  rdi, 100
    mov  rsi,  20
    call add_rdi_rsi
    mov  [rbp-8], rax
    ...
    leave
    ret

  add_rdi_rsi:
    add rdi, rsi
    mov rax, rdi
    ret
PWN101.1 - Stack

Calling Functions

The call func instruction is equivalent to push rip; jmp func, which stores the instruction pointer (i.e. pointer to the next instruction) onto the stack.

    call add_rdi_rsi  ; push rip / push <ptr to mov [rbp-8], rax>
    mov  [rbp-8], rax

The ret instruction is equivalent to pop rip, taking the most recent stack entry and putting it into the instruction pointer. Essentially a jmp back to the stored return address.

PWN101.1 - Stack

Calling Functions

PWN101.1 - Stack

Arguments (32)

On x86-32 Linux function arguments are stored in reverse order on the stack. add(100, 200, 300) would compile to:

    push 300
    push 200
    push 100
    call add
PWN101.1 - Stack

Arguments (64)

On x86-64 Linux function arguments are first stored in registers rdi, rsi, rdx, rcx, r8, r9. Beyond 6 arguments they are stored on the stack like 32-bit.

    mov rdi, 100
    mov rsi, 200
    mov rdx, 300
    call add
PWN101.1 - Stack

Calling Conventions

The order and placement of args, which registers must be preserved (e.g. rbp), and how stack frames are cleaned up are together called a calling convention. These slides apply to most Linux systems, but calling conventions differ across platforms and even binaries.

https://en.wikipedia.org/wiki/X86_calling_conventions

PWN101.1 - Linux / ELF

Linux Binaries

PWN101.1 - Linux / ELF

ELF Files

Linux binaries are packaged into ELF files.

$ file /bin/ls
/bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=2f15ad836be3339dec0e2e6a3c637e08e48aacbd, for
GNU/Linux 3.2.0, stripped
PWN101.1 - Linux / ELF

ELF Files

These files contain all the necessary data to execute machine code on as a Linux process.

  • the machine code (.text)
  • data sections (.data, .bss)
  • entry point
  • symbols
  • dynamic functions
PWN101.1 - Linux / ELF

ELF Files

You can examine all the ELF metadata with readelf -a /bin/ls.

Or the assembly code with objdump -D -M intel /bin/ls, but better disassemblers exist.

PWN101.1 - Linux / ELF

Dynamic Linking

When code calls functions from another library, that library can be statically or dynamically linked.

  • static - includes library code into the final binary
  • dynamic - includes only a reference to the library function, depends on the user to have the library installed on the system or packaged with the binary.
PWN101.1 - Linux / ELF

Dynamic Linking

On linux, these shared libraries are stored in shared objects, with the extension .so. These are also ELF files.

The executable ELF defines which libraries and function it needs, and they are linked at runtime. The details of this linking will be important for exploits later, but not yet.

PWN101.1 - Linux / ELF

Reverse Engineering Tools

  • debugger
    • gdb
    • lldb
  • disassembler
    • IDA
    • ghidra
    • radare2
    • binary ninja