Memory - the hidden concept behind software / exploits

x86_32
x86_64
exploit-dev
offdev
coding
Tags: #<Tag:0x00007f0ca7977b78> #<Tag:0x00007f0ca7977a38> #<Tag:0x00007f0ca79778f8> #<Tag:0x00007f0ca7977790> #<Tag:0x00007f0ca7977628>

#1

This is an incomplete Wiki article, that is being completed. It’s beginner-focused material, focused on Windows (10).


Developers certainly embrace complexity. Software developers, exploit devs, coders, all kinds of programmers.

The question remains: when are going to focus on simplicty?

Thing is, that all this talk about exploit development, attack software, cyber weapons… it’s about aligning a bunch of "A"s, "B"s, and some memory addresses, all to trigger some extra code to do something extra. If it was easy, everyone could do it. But simplicity is never easy.


##Summary:

Did you say management?

Indeed I did. Most devs don’t know much about that. I mean memory management.

There is physical memory, virtual memory, there are Heaps and Stacks… Most modern programming languages / environments take care of this, somehow. Do we want to know how? Actually not: we call is garbage-collection for a reason.

But memory isn’t “garbage”. If there is a bug, exploitable or not, you need to dive into the process memory. If you don’t know how it looks, you will have to spend lots of time doing guesswork. - And we don’t want that, do we? We would like to know the patterns. The patterns of “simplicity”. Of memory. And of exploits while we are at it.

Regarding exploits we have a skill-bubble problem in information security. Lots of talkers, few users. Kind of weird, actually: people want to learn it, but somehow fail. No clue why. But here are my guesses:

  1. newcomers underestimate the basics. Then they ask questions, and get answers, which seem to be too complex.
  2. it may seem like a waste of time, unless you know that modern exploit development happens in teams. No one knows it all. And you need a lot of practice before everything makes sense
  3. exploit dev courses focus on small sample apps, which do not resemble real world challenges. If confronted with reality, newbies give up easily.
  4. at security cons there is a high amount of irony combined with insider jokes. The presentations often suggest that the exploitation happened because of a super obvious mistake, which you can spot easily. In reality looking at the right places with the right tools is a mindset, which you need to pick up.
  5. there is a large amount of science in exploit development. These days. Defeating SEHOP or Control Flow Guard is possible, but not all the time. Also modern exploit development does not focus on Stack BOFs, and Heap meta-data attacks are significantly harder.

tl;dr: learning exploit development requires more than tech insights. It requires investigative capabilities and the out-of-the box mind-set. But you don’t need to learn exploit development to exploit systems. In fact the simpler the approach to successfully penetrate a system is, the better…

You just need to learn how to use, adapt and understand exploits; that’s why knowledge of memory internals is relevant. Exploits are like any other software. The code can get re-used multiple times. With the right mindset.

Physical memory

Can’t touch this, or can you?

(Picture is from here)

CPU regs are physical memory. From a programmatic perspective you can think of x86 registers as hard-coded variables. CPU registers are fast, and compilers certainly want to make use of them. - That is why there are calling conventions, which optimize the program code into assembly code, that makes use of x86 CPU registers.

An (x86) processor does have cache levels, most importantly Level 1 (L1) and Level 2 (L2). You can find these in marketing brochures, if you looking for the right CPU for your Gaming PC. Bigger is better.

Then there is instruction cache (executable), next to the L1 and L2 cache.

In between the registers and the RAM, there are Transition Look-aside Buffers (TLBs) as part of the Memory Management Unit (MMU). The job of these buffers is to coordinate the pre-fetching of data from the RAM, to optimize the loading time. Lower the access latency. Make things faster.

Check: memory isn’t garbage. Registers are fast. Memory in modern computers is all over the place.

Further references

x86_32 and x86_64

Now let’s think of these registers as states, within x86 and x86-64; as a guidance for memory transitions. In good old x86 we have the 8 General Purpose Registers (32 bit):

  • EAX - I call it the arithmetic reg. Does calculation. Or returns stuff to caller.
  • EBX - all kinds of stuff
  • ECX - Count Register, often used with loops
  • EDX - often used with EAX calculations. or pointer to I/O ports (driver stuff)
  • ESI - Source index, often used for string ops, one byte at a time
  • EDI - Destination index, to write locations during string ops and loops
  • EBP - Base pointer. Anchor point for the stack frame. Relative orientation for local variables and arguments
  • ESP - Stack pointer - top of the stack, moves up and down

Here is a trick question: what is AX, AH / AL? X stands for … X’tended, H stands for High, and L for Low.

-Where is Low? Left or right? Check out the word Endianness. It’s important.

At least you should take away that BH is not the guy from the A-Team. His name is BA. And L is Right. in x86. Because Little Endian order is from Left to Right. Which includes, how you write memory addresses.

In good old x86_64 we have 16 similar General Purpose Registers (64 bit):

  • RAX - 64bit Accumulator / Arithmetic reg
  • RBX - all kinds of stuff
  • RCX - 64bit Counter Register, loops
  • RDX… same
  • R8 to R15 - depends on the compilers / calling convention

Here is a another trick question: how many bits does AX address? - EAX - 32. RAX - 64. And AX? Is that the same in x86 and x86-64?

Next… We have 6 x 16 bit Segment Regs:

  • CS - Code Segment (offset that may point to the start of the code segment)
  • DS - Data Segment (offset that may point to the start of the data segment)
  • SS - Stack Segment
  • ES - Extra Segment
  • FS - Undefined ?!
  • GS - Undefined ?!

These usually reference memory locations. Windows uses them one way, Linux the other way… Undefined. Personally I think we should shed some light on the way Windows uses them.

The FS register on Windows

The FS reg on Windows always points to the Thread Information Block (TIB). This is the most important structure for Windows Shellcode.

The TIB is the a control point of Windows processes. It points to the Process Environment Block (PEB). It also points to the Structured Exception Handling Chain (SEH chain).

If you disassemble Shellcode you will most likely see that it accesses the FS reg. Sometimes you need to adjust the offsets a little, if you have different versions of Windows. Or if someone made a mistake.

Shellcode usually doesn’t make a Windows API call to get the structural data:

  • FS:[0x00] - Pointer to the start of the SEH chain (a chain of callback functions to define what happens in case of an exception, that is not defined by the developer.
  • FS:[0x30] - Adress of the PEB (every process gets a PEB on Windows)
  • FS:[0x18] - Adress of the TIB (on Windows each thread has a TIB. Threads have their own Stack, but share the Heap and .code)

The PEB contains process infos, like the Image Base Address, Heap address, Imported modules (kernel32.dll, ntdll.dll…). Note that this is a userland structure. Within the kernel this is EPROCESS / KPROCESS (see below).

The GS register on Windows

The GS reg is often used for Thread Local Storage (TLS / And no, not the encryption security layer), or within Kernel context switches.

TLS is very important for Malware Analysis. For example Ransomware often executes TLS callbacks before the Original Entry Point (OEP). That means if you load a Ransomware into Olly, it will infect you system even though it holds the execution on the OEP (default).

In memory internals we refer to Userland as Ring 3, and Kernel as Ring 0. On Windows ntdll.dll is your Syscall gateway from Ring 3 to Ring 0, from Userland to Kernel. This can happen via SYSENTER. And SYSENTER is followed by SYSEXIT.

Windows kernel traps, Executive and Dispatch tables require some knowledge of Windows Internals.

EPROCESS and KPROCESS are important keys to understand Rootkits and Kernel exploits.

In my opinion you need to have a good understanding about Linked Lists, and about Algorithms and Data Structures here. Because this is implemented, just like any other program, that makes use of these programming concepts.

Check: x86 < x86_64, On Windows FS points to userland structs with process infos, and GS points to kernel structs, with kernel infos. Process and memory-management uses basic programming concepts like double linked lists. And there can be stuff before the OEP.

Memory sets the course of the control-flow

Now, let’s wrap up x86’s final points. It’s relevant, to have a basic understanding of the internals. Because otherwise you look at a debugger, and you don’t know what you see. All these registers and flags, states of memory… lots of information.

  • We have the EFLAGS reg. Zero Flag, Negative Flag, Carry… each bit in this reg has a meaning.

  • Our friend: the *IP. It holds the virtual mem address of the next inst, which is going to be executed.

    • EIP / RIP are read-only.
      • We do not write into EIP. We hijack the execution flow, alternate the control flow… however you want to describe it. We set a different course of the program without using the steering wheel. Instead we (ab)use the internals. Quite simple, actually.

Once you understand the influence of these registers and the internals, you can do magic.

There are control registers, which are important. CR0 - CR4. CR0 is an important flag during kernel exploitation. You can find infos, here, here and here. For example.


Summary: x86 and x86_64 need to be explored. It’s learning by doing. After reading my 2-3 tutorials you should grab a debugger, set breakpoints, and try to understand what you see. Modern compilers use a lot of the x86 concepts, even if you just have a hello-word in C.

Swap'ing and Page'ing - we know 'ing now

Now if you open your 32 bit hello.exe it gets 4 GBs of virtual address space by default, and up to 64 GB with the Physical Address Extension (PAE). Virtual here means, that only a portion of it is mapped / physically allocated per process in the Page Directory.

In virtual memory mapping the Kernel memory gets shared across all processes, and the Userland gets rebased. DLLs and Shared Objects (SOs) get rebased. That means the DLL may (if that is supported) get a different reference / base address per process.

Not all DLLs get rebased. This leads to the topic of ASLR (Address Space Layout Randomization). Which is an exploit mitigation.

Pages can swap

A Page is a portion of memory, let’s say 4 KB. Memory gets sectioned.

The gist of this is that the virtual address is part of the context. Which means a virtual addr like 0x66121212 will be a different RAM addr per process.

The other notable point is that CR-3 is involved. There are multiple Kernel pools on Windows. And depending on the version the page sizes and default sizes differ. Difficult topic. Different article. Maybe.

Swap’ing is when Pages, which haven been accessed recently, get allocated on the disk. If something has been swap’ed and needs to be accessed, that is a Page Fault. What gets swap’ed is decided by the OS.

PE / ELF / Object Files

Another thing which somehow became hidden knowledge are so called Object Files. PECOFF is a binary format; stands for Portable Executable Common Object File Format. More information is here.

The format on Linux / BSD is called ELF. It’s important, but the focus of this short wiki post is Windows; again.

(from Code Breakers magazine, 2006, “Security and Anti-Security - Attack and Defense”)

  • Code section - .text. NX - Non executable. insts / code.
  • Data section - .data. RO - Non writable. Initialized variables.
  • BSS - .bss - uninitialized symbols, Writeable, pointers…

Now this gets mapped into memory. Where do Heap and Stack get initialized? Sure, the .bss has the highest offset, because it’s before the .data segment.

Also preferably we want these sections to be $W \lor X$, either writable or executable.

Heap and Stack on Windows on x86

Changelog
  • 26.2.2019 - just found this article in my archives and decided to put it into the public wiki.

Malware Analysis: foundation level workflows