It's not about reading CISC, it's about the flows
In Exploit Development and Malware Analysis it's not about reading assembly. It's about understanding and directing the control flow. That is valueable. Who needs all these countless lines of instructions anyways?
- In case you are writing Shellcode you need the low-level controls of assembly to create compact pieces of injection code. It just needs to fit and re-direct the control flow.
- In case of Malware Analysis you usually do not have the source code, and therefore you fall back to the low-level and you have to use a disassembler.
Like two sides of the same coin: assemble Shellcode, disassemble Malware. Who can do the one thing, also can do the other. That's where the value is; in the control flow creation or reverse-engineering. The rest is clutter.
But many beginners are impressed if they fire up IDA Pro or OllyDBG for the first time. After reading this wiki article you won't be impressed any more. You will be impressive.
And the only thing left to think about is whether you want to write
pop pop ret or the hex equivalent. Which is?
tl;dr: There are assembly reading strategies. There is a structure to it. Make use of it, save time, get the job done. That's what this wiki page is about.
This wiki article focuses on patterns and reading strategies rather than concepts. If you are really interested read Chris Eagle's IDA Pro book. I personally did not like Randall Hyde's "The Art of Assembly Language" because it's too artificial.
IA 32 general purpose registers
These registers can be used for everything. Commonly there are the following orientation points:
EAX usually has return values (depending on the calling convention of course). Often used for addition and multiplication.
ECX is used as a counter and the
this pointer in C++ (depending on the compiler of course).
EBP is used to reference args and vars (from the stack)
EDI are typically used for memory management
EIP is the finger, which points to what is going to be executed next. That's why shellcoders like it.
EFLAGS is used to represent computation results.
SS point to the code, data and stack segment of the process.
WORD is a set of bits
In IA-32 a register is 32 bits. That is a double-word, or a DWORD.
That means EAX. EBX, ECX, EDX ... (the IA32 registers) get accessed as a DWORD. That is 4 byte. 8 bit x 4 = 32 bit = DWORD. Simple as that. You can address them like this:
EAX = DWORD = 32 bit
AX = low WORD = 16 bits
AH = high byte = 8 bits
AL = low byte = 8 bits
That mostly matters for the
mov insts. We will use
AL. It still happens, even in x86-64. Obviously...
Addressing - relative and indirect
In IDA Pro disassembly [EBP+foo] may be something like [EBP+0x42] or [EBP-0x42]. There will always be a
+, because arithmetically
7 + (-2) == 7 - 2 . IDA does not like to display numbers in its disassembly listings. It will always be
+foo even though foo is negative.
x86 uses "relative addressing". But with IDA Pro this is not a real problem.
Pointers - can easily be understood via basic x86 assembly
The magic word to understand a pointer is "indirection".
"If you want the value of this, go there." A pointer can point to a memory location. Accessing this memory location is called "dereferencing". Sometimes memory structures are large, and several functions work on them. You don't copy them around, if you can avoid it.
Take a look at this super short C++ code, and it's assembly after compilation:
int val = 5;
int *ptr = &val;
In this case it's quite simple:
- 5 is
lea loads the effective address of
eax. In C++ this is equivalent to the
- EAX holds the address of (
- and this stuff in
[ebp+ptr] which is the address of the points-to variable
win32 Sleep() syscall this is moved into
ECX, where the
dwMilliseconds parameter is expected.
Further infos on the calling convention and on the Windows syscalls are in the next section. Here the emphasis is that the pointer operations can be simply explained with assembly.
System Calls - why do reversers care about calls and Handles on Windows?
Sys calls are points of interaction between the process and OS or hardware. On Windows you have the Windows API, which is a facade. On Linux you have the syscall.h .
Take a look at this C++ code, which uses a WinAPI call:
LPSTR text = "Hello, world!\n";
hStdout = GetStdHandle(STD_OUTPUT_HANDLE);
WriteFile(hStdout, text, 14, &charsWritten, NULL);
And in IDA Pro (with Powerpoint magic):
(sorry, typo in the slide, I know)
The parameters for the
WriteFile syscall are
push'ed on the stack in the order which is indicated by the MSDN documentation.
The definition of the WinAPI call is given there:
BOOL WINAPI WriteFile(
_In_ HANDLE hFile,
_In_ LPCVOID lpBuffer,
_In_ DWORD nNumberOfBytesToWrite,
_Out_opt_ LPDWORD lpNumberOfBytesWritten,
_Inout_opt_ LPOVERLAPPED lpOverlapped
- A Handle is a Windows concept. It's used to handle system resources. A
call is used to retrieve the handle, which we need.
- You see that the values are pushed in reverse order, because the variables are put onto the stack. A good reading strategy is to go to the
call you are interested in and to read upwards.
- The Run Time Checks can be ignored here, but this topic is generally relevant for exploit development. But not today.
The WinAPI call does not use
cdecl. It uses
The difference between
stdcall is subject to another section.
The point of this section is, that arguments for WinAPI syscalls are laid out by the calling function. The parameters are pushed onto the stack here (that can be different for other calls). Windows needs Handle objects to handle resources. It is quite common to track Handles with a debugger, to start reversing an application via its points of interaction with the system. OllyDBG can do that.
Conditional jumps are contextual instructions. The conditions are usually in a
cmp instruction before a
ja. Conditional jumps are mostly used for loop constructs and if-then-else compound expressions.
Reading heuristic: if the jumps go two different locations it's an OR, if they go to the same location it's an AND
This is simple:
- if the one condition of an
AND compound expression fails, it can never be
True. Therefore the 2nd
cmp here gets skipped.
- for an
OR compound expression that is different. Both expressions have to be evaluated.
This can speed up reading assembly a lot.
Reading heuristic: dashed lines indicate a loop
In order to illustrate this I use BinNavi. It has nicer graphs which look better. For this level of reverse engineering BinNavi is not needed typically. But for the looks of it it's worth a shot.
You can see the local variables at the bottom. So you don't need to right-click like in IDA Pro.
The usage of the variable
bar is tracked. There is a
5h. In case the flag for
jg is negative we follow the red line, and iterate further. That is the head of the loop.
The loop body is on the right. So
EBP+BAR needs to be greater than 5 (decimal) before we stop looping over the left block (loop body).
In the loop body block is an
add. 1 is added to the register, which holds the value of
bar; after the
mov. This loop body has a non-conditional jump (
jmp) back to the loop head, where the condition then is re-evaluated.
bar is larger than 5 the right block is executed. Where all that iteration work is undone, and
bar gets set to 0 again. What a pity... now we have to do another loop example.
This is an example of a
while (bar >= 5) loop. If it's greater than 5, we stop. And then we move on with the control flow.
On the left this is the function epilogue. Easy to spot by the
retn and the
mov esp, ebp.
Here is another loop, as promised:
The local loop variable
i is highlighted. It's loaded in
add instruction increments the value. Nothing new here.
jz after the
cmp jumps out of the loop we will have an unconditional
Practically this illustrates the difference between a
while loop (example 1) and a
for loop (example 2). The
for loop for example 2 looks like:
for (int i = bar; i != 5; i = i + 1)
Calling convention and function reverse engineering
Arguments of a function, like
function(var1, var2, var3) will be handled depending on the calling convention. There are different types of calling conventions, which might be chosen depending on the optimization strategy of the compiler. Is it about getting a small binary or a fast execution?
Reading heuristic: if it's relatively positive to EBP, it's a function argument. If it's relatively negative to EBP it's a local function variable
Generally the stack grows into lower address space.
If you see a variable that is relatively addressed to EBP, like
[EBP+8] - with a positive offset - that means it's above the frame pointer. That usually means it's a parameter.
If you see something like
[EBP-8] it usually is a local variable inside of the function.
There is a function prologue and an epilogue. Functions are sometimes called basic blocks. For example when it's about code coverage, and binary instrumentation. Latter is a fancy word for debugging actually. Very academic.
Reading heuristic: from `push ebp` to `retn`
You can identify basic blocks, which usually enclose a function, by looking between the instructions:
push ebp ; prologue
mov ebp, esp
mov esp, ebp ; epilogue
pop instructions are equivalent to
Note that compilers sometimes inline functions inside of each other. Then you will see the prologue in the middle of a basic block / function. That's a performance optimization technique.
cdecl, stdcall, fastcall - what does that mean?
Take a look at the assembly for the
write_more_garbage() function again:
After the WinAPI call block, which is highlighted in yellow, you can see that there are these RTC checks again. But what sticks out is that after the RTC
call there are 5
pop instructions. These take 4 byte (32 bit) of the stack, each. This clears out the call stack we needed for the WinAPI call. The code was compiled without optimizations. Otherwise this can be done faster in one instruction. You can thank Visual C++ for that.
write_more_garbage() function is using
cdecl. IDA Pro indicates this on top of a function block.
main function also uses
cdecl, and the
loop_main() function as well. Latter takes 3 arguments, which are
push'ed before it's called. Then the called function has its function prologue where
EBP. And that is also the reason why parameters are referenced relatively negative to
EBP. They are on the stack before
EBP is initialized in the called function.
As you saw WinAPI calls make use of
stdcall; and not
cdecl. In practice that means that you will see the callee cleaning up the stack. The WinAPI call will not do that. But the arguments are
push'ed before the WinAPI
stdcall is similar to
cdecl, but the WinAPI calls are too lazy to clean up the stack. Or Microsoft thinks it's faster that way. Or both.
If a function uses
fastcall you will see the parameters not being
pop'ed. At least some of them will be
mov'ed into registers, because these have much faster access times.
C++ compilers may also use
thiscall, which is similar to
cdecl. We will see that the called function cleans up the stack. But C++ reverse engineering is a difficult endeavor. Generally the
this pointer is pushed to the stack last.
thiscall will hold the
this pointer. However do not take this for granted.
Return values from functions
Return values will often be in
EAX. That being said there is a confusing
movzx instruction used below.
This is because we don't know the high bytes of
AL are the 8 low bytes of
EAX. The rest gets zero'ed out with
movzx is useful for Shellcode as well. The
return value by the function simply doesn't need the entire register.
I compiled the code with Visual C++, to have a real world example. I don't like reverse engineering tutorials or articles with examples which are too artificial. You have to teach people to spot the patterns. Like in driving school you are being taught how a STOP sign looks. And once you learned that spotting it is easy. Which doesn't mean that everyone stops. Computers are different in that regards. For now. That's besides my point, that malware analysis and exploit development have in common, that it's about patterns.
xor EAX is reset to 0. A simpler expression is
mov eax, 0. But
xor is faster. In exploit development or code deobfuscation you will need to work a lot with XOR.
Now we know how to read x86 assembly. It's not crazy hard and doesn't take a lot of time if you spot the patterns of the control flow. Everyone can do it, because assembly isn't complex. It's just very low level and verbose therefore.
x86 assembly is a must-know for a security engineer, who needs to deal with Malware or Shellcode. That isn't for everyone, but at least the basics are. And this is an extremely basic summary of how it works. Working with IDA Pro and BinNavi is like driving, like navigating, through a binary. At some point there is a familiarity with the conditions and the control flow, and then it really doesn't matter if you fire up a debugger or only your disassembler. As long as you get the control flow and understand what you are doing. I usually do both.