Intro to Reverse Engineering
Adam Hassan / October 2022 (1226 Words, 7 Minutes)
s/o to tjcsc. much of this content was taken from them.
Transcript:
Intro to Reverse Engineering (speedrun)
We’re covering a lot today : Simple Reversing Tools Assembly Basics GDB / GEF Basics Ghidra Basics SMT Solver Basics 8 sample ctf-style reverse engineering challenges The best way to learn is to do. You can use these slides, me, and google for help Take notes, but you can reference the slides for most things Schedule
Why do you want to do reverse engineering? CTF challenges Game Hacking Tools like Cheat-Engine let you modify memory You can crack games Don’t do this if it’s against TOS Analyze Malware Makes you a better programmer and debugger!!
The Basics (ezpz)
Tools for easy “reverse engineering” - strings
Tools for easy “reverse engineering” - ltrace ltrace: library call tracer
Tools for easy “reverse engineering” - strace strace: system call tracer
Tools for easy “reverse engineering” - rabin2 rabin2: Binary program info extractor rabin2 -z : shows strings
Tools for easy “reverse engineering” - rabin2 rabin2: Binary program info extractor rabin2 -s : shows symbols
Assembly (i’m not teaching you this)
Assembly is how we make bytecode (binary) readable. Set of mnemonic instructions to understand how to code works
jk i’ll teach you the ~basics~ mov a, b moves b into a add a, b adds b to a push a pushes a to the stack cmp a, b compares a to b jmp a jumps to the address, a call a calls function at address, a This is intel syntax easy to read Heavily simplified, but that’s how it works Also: sub, inc, lea, xor, and, pop, nop Also some weird ones: mpsadbw, cmpxchg, prefetcht0 Learn the basics and google the rest
EAX, EBX, ECX, EDX (All general purpose) These are just general places to store “stuff” ESI and EDI are both general purpose AH, BH, CH, and DH access the Higher 8 bits of the general purpose registers AL, BL, CL, and DL access the Lower 8 bits of the general purpose registers EAX, AX, AH, and AL are called the Accumulator registers Used for I/O port access, arithmetic operations, logical operations, interrupt calls, etc… EBX, BX, BH, and BL are all the Base registers. Used as base pointers for memory access. Used to store pointers to system calls. Sometimes used to store the return values of interrupts. ECX, CX, CH, and CL are all the Counter registers EDX, DX, DH, and DL are all the Data registers. Used for I/O port access, arithmetic, and some interrupt calls. EBP, ESP, and EIP are reserved (and important!!) registers EBP: Base PointerESP:Stack Pointer
EIP: Instruction Pointerx86_64 versions: RBP, RSP, RIP Registers (x86)
There’s a lot more to learn You’ll learn as you go Jon usually gives a very good x86 assembly lecture every year More
Resources HackUCF x86 Crash Course Godbolt Compiler Explorer C Stack Visualizer
GDB/GEF (jeff)
What is GDB? A debugger for compiled binaries Usually used to debug programs but can also can be used to reverse programs with dynamic analysis Dynamic analysis - Testing a program by executing it in real time $ gdb myProgram
Basic Commands
run
break
x/nfu address n - Repeat count (how many memory to display) f - Display format (x|d|u|o|t|a|c|f|s) or i,m u - Unit size (g|w|h|b) e.g. x/5xw, x/s &myVariable print expression print $rax disas myFunction Let us see things & is used to mean the address of a variable. In this case, we’re using it to examine the string at the memory address of myVariable.
getting more info info lets you get a lot of info break registers frame symbol address variables
A lot more commands There are a lot of other commands If you want to do something, there’s probably a way to do it! help
GDB Enhanced Features (GEF) GDB plugin Provides really useful features specifically for binary reversing and exploitation https://github.com/hugsy/gef INSTALL: bash -c “$(wget https://gef.blah.cat/sh -O -)” there are alternatives to GEF but GEF is better bc its easier
Some Interesting GEF Commands assemble checksec vmmap telescope heap context shellcode search-pattern grep There are some other ones that can be really useful later, such as pattern (create|search).
The best way to learn about GDB is to use it!
Resources GDB Homepage Nice GDB Quick Reference Sheet GEF Documentation
Ghidraaaaaaaaaa (the best)
What is Ghidra? Reverse engineering tool made by the NSA ?! Has a great decompiler Pre-installed on Kali Used for static analysis Analyzing the program without running it
Start by opening your program and pressing Analyze
Some Other Tools for Static Analysis IDA Pro (Very Expensive) Binary Ninja (Expensive) But made at UF :0 rizin (Free) Old version is radare2
It doesn’t really matter what you use; just get accustomed to it
how do I use it ??? Make new project the first time you start Ghidra File > Import File (Or I) to import file Double click file to open it Press “Yes” or “OK” on everything to analyze it https://ghidra-sre.org/
Decompiling Functions Note that C code is not compiled 1-to-1 to machine code Click on function to decompile It’s not always accurate! Make sure to look at the assembly listing too While you can usually get assembly code from a program, it is more difficult to get the “C version.”
how do I read efficiently???
Renaming Variables Change the name for the variable In the Decompile pane or the Listing pane, press L to rename a variable
Retyping Variables Set/change the data type of the variable In the Listing pane, press T to retype a variable In the Decompile pane, press Ctrl+L to retype a variable
Note that you can also retype arrays!
Making Arrays Press [ to make an array
Making Structs struct - A collection of variables under one a single name Note that a most, if not all, compilers like to pad structs with values you don’t need; you might need to analyze the program a little to see what you do/don’t need to access directly.
Set name and size of struct
- Set data type(s) of member(s)
- Set name of member(s) After this, you can set a variable’s data type to this struct, like how you would with other data types.
Exporting (Global) Data
https://i.imgur.com/SlDIjzu.png
To unpack your data, you can use the struct package in Python
When to GDB and when to Ghidra? Well… it depends It’s useful to use both GDB and Ghidra GDB for dynamic analysis Useful for looking at memory Ghidra Useful for understanding code & memory addresses As you get more accustomed to both, you will develop an intuition as to when you want to be more GDB-heavy or Ghidra-heavy.
Ghidra Homepage Ghidra Cheat Sheet Resources
SMT Solvers (automatic reverse engineering??)
What is SMT? SMT Satisfiability Modulo Theories Can solve logic stuff code math booleans angr
Z3 - Boolean Logic
Z3 - Integer Logic
Z3 - BitVectors
Z3 - Example
angr Binary analysis framework written by Shellphish, a CTF team made up of PhD students at UCSB. Acquired by ASU Now being worked on by one of UFSIT’s alumnus Built on top of: CLE (CLE Loads Everything) Binary loader Pyvex Emulate the instructions / Binary translator Claripy Front end for Z3 <- backend solver Many features: Control Flow Graph Generation Symbolic Execution You can execute C functions inside of python :0 ROP Gadget Finder GUI (called angr-management)
angr - solving based on the win/lose addresses
angr - solving based on the win/lose strings
angr - resources My notes on this: https://bit.ly/3SH5cb5 https://github.com/ViRb3/z3-python-ctf https://ericpony.github.io/z3py-tutorial/guide-examples.htm https://docs.angr.io/ https://docs.angr.io/examples
More Tools (not everything is an x86 binary)
Other tools to keep in mind dotPeek or dnSpy decompile .NET executables jadx and jadx-gui decompile apks devtoolzone decompile java online apktool decompile apks apktool d *.apk
Practice!
TryHackMe - ReversingElf https://tryhackme.com/room/reverselfiles Make an account and do the activities Link to these slides: https://bit.ly/3CejLMf Use them as a reference If you don’t have an x86 computer (M1, M2…) ssh root@147.182.163.55 password: Zdq&3d&gB8TSNd gef is already installed
TryHackMe - ReversingElf - Writeup https://bit.ly/3T1PirR