Malware Analysis and Forensics - the art of wandering through fog and darkness with open eyes
These wiki notes are extremely condensed.
They do not sum up the the complete application of Malware Analysis methods, and how I practically work. These notes are limited to what I consider relevant, as part of my shared knowledge-base. This is a growing section. It’s meant to help security engineers.
If you want to contribute to these postings, feel free to sign up here and shoot me a mail to marius - at - because-security.com. Generally speaking I want to keep this post simple and as an overview only. The emphasis on the workflows is to avoid to spread the misunderstanding, that Malware Analysis is about applying tools from a large toolbox. The point is to provide forensic insights, and to apply these in Incident Response.
This wiki post is limited to Malware Analysis and Malware Incident Response in an enterprise setting. Later there will be specific pages for embedded, mobile and IOT. But these will be limited to enterprise systems as well.
- Lab setup
- Static property analysis
- Behavior analysis
- Code Analysis
- Workflow 1 - lab services
- Workflow 1 - behavior records
- Workflow 1.1 - basic debugging
- Workflow 1.2 - image analysis
- From Malware Analysis to Incident Response
- Workflow 2 - make Malware Analysis part of IR
- From Malware incident response to incident reports
- Workflow 3: save early, save often, save all
- Workflow 3.1: involve early, involve professionally
Being able to acquire binaries from infected systems, and performing Incident Response is a preference.
- Typically for disks FTK Imager, EnCase etc are used. But in many cases GNU
ddcan be enough, if it’s just about making sure that the malicious executable including its environment (DLLs, SOs) gets acquired from the disk.
- Mandiant Memoryze can be used for volatile data acquisition of live memory on Windows systems.
- Modern Endpoint tools like Carbon Black Enterprise Response or CrowdStrike Falcon have a lot of real-time data acquisition functions.
VMware Workstation 12.5 lab setup
I use a Windows host for the VMware Workstation based lab and decided not to bother with Oracle Virtualbox or Linux KVM here. I think Linux KVM with qcow2 can be better for taking snapshots and working with self-defending Malware. I don’t limit myself to any tool, or even OS.
Host operating system isolation with VMware Workstation
Preferably the host OS is blank, not in a Domain. If possible offline. VMware Workstation is prune to exploitable software bugs.
Network isolation with VMware Workstation
Host-only networking. VMs become gateways. Services like DNS or HTTP are added one by one. This allows to compare a growing set of behavioral data.
Static property analysis
Focus here is: Windows binaries. No ELF or Mach-O. And no mobile devices. This might be addressed in a later section, but generally it’s not the focus of the post series.
PECOFF static property analysis on RemNux
peframe- these enumerate the properties of the binary file, based on metadata and other static properties. These tools generate various hashes, which you can use for searches. Some also list Windows API functions, which can be of interest.
PECOFF static property analysis on REMworkstation (Windows 10)
I found that Malware Analysts use OllyDBGOllyBbg a lot. I prefer BinNavi client debuggers and a separate VM.
Blog posts regarding BinNavi are tagged with “610_navi”. Some are public.
Quick list of tools
- Process Hacker - like Task Manager ++
- Process Monitor - Microsoft Sysinternals tool to observe how processes interact with the environment
- TcpLogView - NirSoft tool, which keeps a history of TCP connections,
- Wireshark - network sniffer and analysis tool
- CaptureBAT - combined network and API calls observation tool for initial analysis and logging
- Suricata - can use custom IOCs and Emerging Threat rules to classify communication via network behavior. Also for PCAP processing.
- Proxifier (WPAD and Socks) or SocksCap. On Linux
proxychainsmight work, for some services.
- Burp or ZAP
accept-all-ipswith iptables NAT for IP based redirection instead of DNS (by Lenny Zeltser)
- macshift for MAC address spoofing on Windows. A MiTM style of interception.
For reading these logs - fast
Code Analysis refers to the analysis of a Malware using assembly language.
The most common tool for this is the IDA Pro disassembler. Whether you read the disassembly in BinNavi or IDA views is not relevant.
I have a short intro article on assembly reading skills here, and a general article on what to focus upon when you are dealing with malicious of offensive code here. Latter also mentions a couple of x86_64 specifics, and will become public once it’s finished.
Alternatively to IDA Pro I encourage you to look at Bokken and
radare2, Medusa or metasm. Sadly you cannot just drag & drop PEs into these tools, and you have to know their limitations if you perform Malware Analysis.
- Intel IA-32 manuals
Workflow 1 - lab services
After static property analysis you continue to investigate the Malware samples in the lab.
The debugee VM uses RemNux as main gateway and DNS server. RemNux is in host-only networking mode, so that there is not internet access. Later RemNux runs FakeDNS or something similar. The Debugee VM has a “clean-slate” snapshot, and a “post-infection” snapshot for revertability.
Workflow 1.1 - behavior records
The Malware sample gets observed. For traceability we run tools from the Behavior Analysis section. It’s common to add services one by one. Like a web server, and its web app that returns some objects.
Goal is: understand what the Malware sample does.
Workflow 1.1 - basic debugging
We can also perform some dynamic analysis with a debugger. Like BinNavi or IDA Pro, Olly or WinDBG directly.
Here we re-use the behavioral information to quickly re-trace the functions through instrumentation. The usual workflow is to enumerate the handles and Windows API calls, and to set breakpoints. Often it’s possible to grab some data structures from the program stack or heap. Or even from a temporary file.
Goal is: understand how the Malware sample does it.
Workflow 1.2 - image analysis
Goal is: Create a timeline analysis to determine when the Malware sample did “it”. Timing signatures are essential to align Behavior Analysis and File System Forensics with Code Analysis.
From Malware Analysis to Incident Response
Indicators Of Compromise can be defined, and are supposed to reveal the infection within the corporate network or infrastrcuture.
List of tools to search for IOC signatures
- OpenIOC - very powerful, and found in products
- Yara - more common
- El Jefe
- Carbon Black Enterprise Response or CrowdStrike Falcon; or Trend Micro Deep Discovery
- see this blog post for more infos
- Tripwire, OSSEC etc. - for simple cases
Workflow 2: make Malware Analysis part of IR
Incident Response and Forensics can go hand in hand.
From Behavior and/or Code Analysis patterns emerge, which can be formulated as Indicators of Compromise. Many next-generation security appliances produce a lot of logs.
Carbon Black Enterprise Response can log system behavior into Syslog, which you can analyze with a SIEM and Threat Feeds. Often SIEMs like IBM QRadar can be configured to detect basic patterns this way.
With capable (verbose) next-gen Endpoint Protection it’s possible to enumerate infected systems, and ideally to determine the time of the initial compromise.
Such tools are not AntiVirus products, with signature based similarity checks, but system process monitors with granular activity records.
Goal is: make sure that the analysis can reveal Malware attacks fast, possibly company wide. Understand that modern tools can support the company, to fortify against Malware.
From Malware incident response to incident reports
Reports are essential, to communicate security incidents.
Workflow 3: save early, save often, save all
Golden rules are:
- simple executive summary in common language. No “Indicators of Compromise” etc.
- structured report format, with a linked ToC so that people can read sections individually
- if you make a visualization, it should be non-technical. Technical security data visualizations are nice, but not useful for inter-departmental communication.
- add reference reports (Mandiant, Verizon, local reports etc.)
Risk is “impact” and “likelihood”. Make sure that term is understood well, before you use it. Risk matters, and you need to provide assurance that you address it.
If you have identified IOCs and developed the detection, you need to train the SoC. Attackers tend to come back, and it’s likely that you have a set of IOCs, with different severity levels. Make sure you have some flexibility in these levels, and that you re-escalate with the help of Threat Feed databases.
If you see a re-escalation, you have failed. That is very important to understand. That means there still is a command and control channel, and that the “purge” was not complete. That happens to the best security teams, because defensive security is very hard, if you want to do it right.
Goal is: make sure the company understands how critical Malware analysis is for the business, and that it can contribute to prevent data theft and service disruptions. It’s a common problem, and needs targeted solutions. These solutions need to be embedded in business processes. Make sure that you always note that the attackers can come back, and that security is a process.
- 15.04.2017: Beta release, brain-dump style.
- 03.05.2017: temporarily de-published to refine the structure.
- 04.05.2017: added more links, fixed formulation errors and typos
- 05.05.2017: added more tools I use into the categories
- 18.08.2017: found time to re-organize this post, so that I can add the follow-up sections later
- 23.08.2017: added Venn diagram to clarify that the workflows are designed to make use of the individual advantages of 3 analysis domains