X86 - the 32 bit assembly workflows you need in Information Security for Malware Analysis and Exploit Development

asm
x86_32
coding
offdev
reverse_engineering
malware-analysis
ia32
Tags: #<Tag:0x00007f38a1284fe8> #<Tag:0x00007f38a1284ea8> #<Tag:0x00007f38a1284cf0> #<Tag:0x00007f38a1284b88> #<Tag:0x00007f38a12849f8> #<Tag:0x00007f38a12848b8> #<Tag:0x00007f38a1284750>

#1

It's not about reading CISC, it's about the flows

In Exploit Development and Malware Analysis it’s not about reading assembly. It’s about understanding and directing the control flow. That is valueable. Who needs all these countless lines of instructions anyways?

  • In case you are writing Shellcode you need the low-level controls of assembly to create compact pieces of injection code. It just needs to fit and re-direct the control flow.
  • In case of Malware Analysis you usually do not have the source code, and therefore you fall back to the low-level and you have to use a disassembler.

Like two sides of the same coin: assemble Shellcode, disassemble Malware. Who can do the one thing, also can do the other. That’s where the value is; in the control flow creation or reverse-engineering. The rest is clutter.

But many beginners are impressed if they fire up IDA Pro or OllyDBG for the first time. After reading this wiki article you won’t be impressed any more. You will be impressive.
And the only thing left to think about is whether you want to write pop pop ret or the hex equivalent. Which is?

tl;dr: There are assembly reading strategies. There is a structure to it. Make use of it, save time, get the job done. That’s what this wiki page is about.

##Summary:

Assembly references

This wiki article focuses on patterns and reading strategies rather than concepts. If you are really interested read Chris Eagle’s IDA Pro book. I personally did not like Randall Hyde’s “The Art of Assembly Language” because it’s too artificial.

IA 32 general purpose registers

These registers can be used for everything. Commonly there are the following orientation points:

EAX usually has return values (depending on the calling convention of course). Often used for addition and multiplication.
ECX is used as a counter and the this pointer in C++ (depending on the compiler of course).
EBP is used to reference args and vars (from the stack)
ESI and EDI are typically used for memory management
EIP is the finger, which points to what is going to be executed next. That’s why shellcoders like it.
EFLAGS is used to represent computation results.
CS, DS and SS point to the code, data and stack segment of the process.

WORD is a set of bits

In IA-32 a register is 32 bits. That is a double-word, or a DWORD.

That means EAX. EBX, ECX, EDX … (the IA32 registers) get accessed as a DWORD. That is 4 byte. 8 bit x 4 = 32 bit = DWORD. Simple as that. You can address them like this:

EAX = DWORD = 32 bit
 AX = low WORD = 16 bits
 AH = high byte = 8 bits
 AL = low byte = 8 bits

That mostly matters for the mov insts. We will use AL. It still happens, even in x86-64. Obviously…

Addressing - relative and indirect

In IDA Pro disassembly [EBP+foo] may be something like [EBP+0x42] or [EBP-0x42]. There will always be a +, because arithmetically 7 + (-2) == 7 - 2 . IDA does not like to display numbers in its disassembly listings. It will always be +foo even though foo is negative.

x86 uses “relative addressing”. But with IDA Pro this is not a real problem.

Pointers - can easily be understood via basic x86 assembly

The magic word to understand a pointer is “indirection”.

“If you want the value of this, go there.” A pointer can point to a memory location. Accessing this memory location is called “dereferencing”. Sometimes memory structures are large, and several functions work on them. You don’t copy them around, if you can avoid it.

Take a look at this super short C++ code, and it’s assembly after compilation:

	int val = 5;
	int *ptr = &val;

	Sleep(*ptr);

In this case it’s quite simple:

  • 5 is moved into val.
  • then lea loads the effective address of val into eax. In C++ this is equivalent to the &val.
  • EAX holds the address of (&) val.
  • and this stuff in EAX is mov'ed to [ebp+ptr] which is the address of the points-to variable *ptr.

For the win32 Sleep() syscall this is moved into ECX, where the dwMilliseconds parameter is expected.

Further infos on the calling convention and on the Windows syscalls are in the next section. Here the emphasis is that the pointer operations can be simply explained with assembly.

System Calls - why do reversers care about calls and Handles on Windows?

Sys calls are points of interaction between the process and OS or hardware. On Windows you have the Windows API, which is a facade. On Linux you have the syscall.h .

Take a look at this C++ code, which uses a WinAPI call:

bool write_more_garbage()
{
	LPSTR text = "Hello, world!\n";
	DWORD charsWritten;
	HANDLE hStdout;

	hStdout = GetStdHandle(STD_OUTPUT_HANDLE);
	WriteFile(hStdout, text, 14, &charsWritten, NULL);
	return 0;
}

And in IDA Pro (with Powerpoint magic):

(sorry, typo in the slide, I know)

The parameters for the WriteFile syscall are push'ed on the stack in the order which is indicated by the MSDN documentation.

The definition of the WinAPI call is given there:

BOOL WINAPI WriteFile(
  _In_        HANDLE       hFile,
  _In_        LPCVOID      lpBuffer,
  _In_        DWORD        nNumberOfBytesToWrite,
  _Out_opt_   LPDWORD      lpNumberOfBytesWritten,
  _Inout_opt_ LPOVERLAPPED lpOverlapped
);
  1. A Handle is a Windows concept. It’s used to handle system resources. A call is used to retrieve the handle, which we need.
  2. You see that the values are pushed in reverse order, because the variables are put onto the stack. A good reading strategy is to go to the call you are interested in and to read upwards.
  3. The Run Time Checks can be ignored here, but this topic is generally relevant for exploit development. But not today.

The WinAPI call does not use cdecl. It uses stdcall

The difference between cdecl and stdcall is subject to another section.

The point of this section is, that arguments for WinAPI syscalls are laid out by the calling function. The parameters are pushed onto the stack here (that can be different for other calls). Windows needs Handle objects to handle resources. It is quite common to track Handles with a debugger, to start reversing an application via its points of interaction with the system. OllyDBG can do that.

Conditional jumps

Conditional jumps are contextual instructions. The conditions are usually in a test or cmp instruction before a jz or ja. Conditional jumps are mostly used for loop constructs and if-then-else compound expressions.

Reading heuristic: if the jumps go two different locations it's an OR, if they go to the same location it's an AND

This is simple:

  • if the one condition of an AND compound expression fails, it can never be True. Therefore the 2nd cmp here gets skipped.
  • for an OR compound expression that is different. Both expressions have to be evaluated.

This can speed up reading assembly a lot.

Reading heuristic: dashed lines indicate a loop

In order to illustrate this I use BinNavi. It has nicer graphs which look better. For this level of reverse engineering BinNavi is not needed typically. But for the looks of it it’s worth a shot.

Example 1

You can see the local variables at the bottom. So you don’t need to right-click like in IDA Pro.

The usage of the variable bar is tracked. There is a cmp between bar and 5h. In case the flag for jg is negative we follow the red line, and iterate further. That is the head of the loop.

The loop body is on the right. So EBP+BAR needs to be greater than 5 (decimal) before we stop looping over the left block (loop body).

In the loop body block is an add. 1 is added to the register, which holds the value of bar; after the mov. This loop body has a non-conditional jump (jmp) back to the loop head, where the condition then is re-evaluated.

Now, if bar is larger than 5 the right block is executed. Where all that iteration work is undone, and bar gets set to 0 again. What a pity… now we have to do another loop example.

This is an example of a while (bar >= 5) loop. If it’s greater than 5, we stop. And then we move on with the control flow.

On the left this is the function epilogue. Easy to spot by the retn and the mov esp, ebp.

Example 2

Here is another loop, as promised:

The local loop variable i is highlighted. It’s loaded in EAX. The add instruction increments the value. Nothing new here.

Unless the jz after the cmp jumps out of the loop we will have an unconditional jmp backwards.

Practically this illustrates the difference between a while loop (example 1) and a for loop (example 2). The for loop for example 2 looks like: for (int i = bar; i != 5; i = i + 1)

Calling convention and function reverse engineering

Arguments of a function, like function(var1, var2, var3) will be handled depending on the calling convention. There are different types of calling conventions, which might be chosen depending on the optimization strategy of the compiler. Is it about getting a small binary or a fast execution?

Reading heuristic: if it's relatively positive to EBP, it's a function argument. If it's relatively negative to EBP it's a local function variable

Generally the stack grows into lower address space.

If you see a variable that is relatively addressed to EBP, like [EBP+8] - with a positive offset - that means it’s above the frame pointer. That usually means it’s a parameter.

If you see something like [EBP-8] it usually is a local variable inside of the function.

Identifying functions

There is a function prologue and an epilogue. Functions are sometimes called basic blocks. For example when it’s about code coverage, and binary instrumentation. Latter is a fancy word for debugging actually. Very academic.

Reading heuristic: from `push ebp` to `retn`

You can identify basic blocks, which usually enclose a function, by looking between the instructions:

push ebp ; prologue
mov ebp, esp
...
mov esp, ebp ; epilogue
pop ebp
retn

The mov and pop instructions are equivalent to leave.

Note that compilers sometimes inline functions inside of each other. Then you will see the prologue in the middle of a basic block / function. That’s a performance optimization technique.

cdecl, stdcall, fastcall - what does that mean?

Take a look at the assembly for the write_more_garbage() function again:

After the WinAPI call block, which is highlighted in yellow, you can see that there are these RTC checks again. But what sticks out is that after the RTC call there are 5 pop instructions. These take 4 byte (32 bit) of the stack, each. This clears out the call stack we needed for the WinAPI call. The code was compiled without optimizations. Otherwise this can be done faster in one instruction. You can thank Visual C++ for that.

The write_more_garbage() function is using cdecl. IDA Pro indicates this on top of a function block.

The main function also uses cdecl, and the loop_main() function as well. Latter takes 3 arguments, which are push'ed before it’s called. Then the called function has its function prologue where ESP is mov'ed into EBP. And that is also the reason why parameters are referenced relatively negative to EBP. They are on the stack before EBP is initialized in the called function.

As you saw WinAPI calls make use of stdcall; and not cdecl. In practice that means that you will see the callee cleaning up the stack. The WinAPI call will not do that. But the arguments are push'ed before the WinAPI call. So stdcall is similar to cdecl, but the WinAPI calls are too lazy to clean up the stack. Or Microsoft thinks it’s faster that way. Or both.

If a function uses fastcall you will see the parameters not being pop'ed. At least some of them will be mov'ed into registers, because these have much faster access times.

C++ compilers may also use thiscall, which is similar to cdecl. We will see that the called function cleans up the stack. But C++ reverse engineering is a difficult endeavor. Generally the this pointer is pushed to the stack last. ECX in thiscall will hold the this pointer. However do not take this for granted.

Return values from functions

Return values will often be in EAX. That being said there is a confusing movzx instruction used below.

This is because we don’t know the high bytes of EAX. In AL are the 8 low bytes of EAX. The rest gets zero’ed out with movzx. movzx is useful for Shellcode as well. The return value by the function simply doesn’t need the entire register.

I compiled the code with Visual C++, to have a real world example. I don’t like reverse engineering tutorials or articles with examples which are too artificial. You have to teach people to spot the patterns. Like in driving school you are being taught how a STOP sign looks. And once you learned that spotting it is easy. Which doesn’t mean that everyone stops. Computers are different in that regards. For now. That’s besides my point, that malware analysis and exploit development have in common, that it’s about patterns.

With the xor EAX is reset to 0. A simpler expression is mov eax, 0. But xor is faster. In exploit development or code deobfuscation you will need to work a lot with XOR.

Summary

Now we know how to read x86 assembly. It’s not crazy hard and doesn’t take a lot of time if you spot the patterns of the control flow. Everyone can do it, because assembly isn’t complex. It’s just very low level and verbose therefore.

x86 assembly is a must-know for a security engineer, who needs to deal with Malware or Shellcode. That isn’t for everyone, but at least the basics are. And this is an extremely basic summary of how it works. Working with IDA Pro and BinNavi is like driving, like navigating, through a binary. At some point there is a familiarity with the conditions and the control flow, and then it really doesn’t matter if you fire up a debugger or only your disassembler. As long as you get the control flow and understand what you are doing. I usually do both.


Malware Analysis: foundation level workflows