OpenSource Netflow collection with SiLK, FlowBat - and how to perform data analysis

data_science
netflow
linux
ddos
network_forensics
Tags: #<Tag:0x00007f38a109a520> #<Tag:0x00007f38a109a318> #<Tag:0x00007f38a109a048> #<Tag:0x00007f38a1099e18> #<Tag:0x00007f38a1099b98>

#1

Netflow is ...

a Cisco driven standard. Ok, and it’s essentially what you use to monitor the connections you are piping through this kind of network equipment.

Netflow is very useful for Incident Response on DDoS attacks for example. However some of the network devices have acute CPU issues when a DDoS comes in, and just don’t produce Netflow records any more. During a DDoS attack they have better things to do. One could say… but actually Incident Response on large scale DDoS attacks against complex networks requires Netflow.

One possible solution is generating Netflow (and I mean real Netflow not just JSON serialized data which contains Netflow information) from SPAN’ed traffic. A switch can mirror all traffic to a Linux VM, which receives it on a promiscuous network interface. From the incoming traffic it will produce Netflow v9 records, and commit them to a file on disk, using CERT NetSA SiLK. In order to allow data analysis we can write a quick JSON serializer as well, which allows us to commit the data into modern Log Management solutions such as Splunk, Sumo Logic or Elasticsearch for example. We can also have a modern dashboard and generate a network overview with charts and plots. Last but not least this log data can contribute to asset information systems.

Generate Netflow from a promiscuous interface

Here’s my tick: ipt_netflow is a Linux kernel module, which has got the unique and almost magical ability to generate Netflow records from a local network interface. It has performed will in all my tests.

First we install ipt_netflow in a virtual machine. Let’s say a Linux KVM guest, residing on a Debian 7 or 8 host. Or we install it on a host - that doesn’t matter here specifically. Check it out:

apt-get update
apt-get install iptables-dev pkg-config build-essential linux-headers-amd64

On Gentoo you need to emerge the relevant iptables packages, but the kernel sources will most likely already be present.

[email protected]:~/Source/ipt-netflow$ ./configure --enable-promisc
Kernel version: 4.4.16-stamus-amd64 (uname)
Kernel sources: /lib/modules/4.4.16-stamus-amd64/build (found)
Checking for presence of include/linux/llist.h... Yes
Checking for presence of include/linux/grsecurity.h... No
Iptables binary version: 1.4.21 (detected from /sbin/iptables)
pkg-config for version 1.4.21 exists: Yes
Checking for presence of xtables.h... Yes
Iptables include flags:  (pkg-config)
Iptables module path: /lib/xtables (pkg-config)
Searching for net-snmp-config... No.
Searching for net-snmp agent... No.
 Assuming you don't want net-snmp agent support.
 Otherwise do:  apt-get install snmpd libsnmp-dev
Checking for DKMS... Yes.
Creating Makefile.. done.

  If you need some options enabled run ./configure --help
  Now run: make all install

Installation done. Next next finish… aeh make, sudo make install. Now let’s load it.You can configure it via procfs.

/sbin/ifconfig eth0 promisc
sudo /sbin/ifconfig 

eth0      Link encap:Ethernet  HWaddr 52:54:00:5e:c9:7f  
      ...
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:291674658 errors:0 dropped:2 overruns:0 frame:0
...
          RX bytes:232405128376 (216.4 GiB)  TX bytes:1089914538 (1.0 GiB)

Now let’s direct the traffic into iptables chains:

iptables -A PREROUTING -t raw -i eth0 -j NETFLOW
sudo sysctl net.netflow.promisc=1

And let’s check how it works:

cat ipt_netflow
ipt_NETFLOW 2.2-11-gd4a6bb2, srcversion 804F46CCFDAC5DBDF683FE5; llist promisc
Protocol version 5 (netflow)
Timeouts: active 1800s, inactive 15s. Maxflows 2000000
Promisc hack is enabled (observed 0 packets, discarded 0).
Flows: active 701 (peak 823 reached 0d0h2m ago), mem 4055K, worker delay 25/250 [1..25] (80 ms, 0 us, 248:0 0 [cpu0]).
Hash: size 506501 (mem 3957K), metric 1.00 [1.00, 1.00, 1.00]. InHash: 8745 pkt, 1244 K, InPDU 54, 7684.
Rate: 308412 bits/sec, 272 packets/sec; Avg 1 min: 378219 bps, 293 pps; 5 min: 437430 bps, 305 pps
cpu#     pps; <search found new [metric], trunc frag alloc maxflows>, traffic: <pkt, bytes>, drop: <pkt, bytes>
Total    272;     74 143856  21897 [1.00],    0    0    0    0, traffic: 165753, 27 MB, drop: 0, 0 K
cpu0     272;     74 143856  21897 [1.00],    0    0    0    0, traffic: 165753, 27 MB, drop: 0, 0 K
cpu1       0;      0      0      0 [1.00],    0    0    0    0, traffic: 0, 0 MB, drop: 0, 0 K
cpu2       0;      0      0      0 [1.00],    0    0    0    0, traffic: 0, 0 MB, drop: 0, 0 K
cpu3       0;      0      0      0 [1.00],    0    0    0    0, traffic: 0, 0 MB, drop: 0, 0 K
Export: Rate 2562 bytes/s; Total 706 pkts, 0 MB, 21178 flows; Errors 0 pkts; Traffic lost 0 pkts, 0 Kbytes, 0 flows.
sock0: 127.0.0.1:2055, sndbuf 212992, filled 1, peak 1; err: sndbuf reached 0, connect 0, cberr 706, other 0

Nice!

Now in some cases you need to put the network interface in promisc before you add the iptables. That is, if you use a PCIe adapter in an IBM server for example. In a VM I have not observed issues like this.

Result so far: Netflow from Linux kernel, generated via promisc interface, goes to localhost:2055; in a VM. How does the traffic get to that lab VM you are asking? You don’t have a switch? Well… look at this little hack:

# port mirroring
iptables -t mangle -A PREROUTING -d $external_IP -j TEE --gateway $vm_ip
iptables -t mangle -A PREROUTING -s $external_IP -j TEE --gateway $vm_ip

Ok… black magic done. Time for some white magic. I recommend to use a switch though. There are tons of virtual switching projects for KVM. Strangely enough port mirroring isn’t among their typical features.

Let's get SiLK installed

SiLK is like a big box filled with network analysis tools. Most network analysts I know are familiar with it. It’s the golden standard in OpenSource Netflow processing and more powerful than anything on the “market”. That is, because the outputs are sane. Most Netflow tools (like Solarwinds) are black boxes and expose limited functions to the end user.

Security analysts aren’t end users. This is a huge problem for many tools, which are commonly deployed in network monitoring: vendors don’t want to expose data interfaces. Due to aeh… security concerns, such as “don’t ask me”[tm]. The philosophy of most network monitoring tools is focused on performance only. This doesn’t help me, when I have to check if there is a DDoS attack. Sure, lots of incoming bytes… easy to spot, Sherlock. But what kind of traffic is it, how is it distributed across the network segments, and is it possibly a failed failover and a resulting network loop? Maybe… but let’s BGP swing the network, just in case it’s not… :slight_smile: Why not.

Ask me[tm] - the new standard of network analytics

Let’s download SiLK and compile it. If you followed my guidance, which I have summarized at the section which exemplifies how to install ipt_netflow, you already have a (K)VM with a C/C++ compiler tool chain, build-essentials and a Netflow source. This is an easy start point. Time for the Netflow collector.

Check this website for the newest tar-ball.

wget "https://tools.netsa.cert.org/releases/silk-3.12.2.tar.gz"

Now here’s the deal: sFlow (useful for LBs such as F5 BigIP) or IPfix (for the newer Ciscos and Junipers) should be supported by a Netflow collector in 2016. I also think libpcap support should be in. For analysis, somehow. So what about applying the manual for SiLK in the box now?

Pay attention to it, and you will end up like this:

config.status: executing silk_summary commands

    * Configured package:           SiLK 3.12.2
    * Host type:                    x86_64-unknown-linux-gnu
    * Source files ($top_srcdir):   .
    * Install directory:            /usr/local
    * Root of packed data tree:     /data
    * Packing logic:                via run-time plugin
    * Timezone support:             UTC
    * Default compression method:   SK_COMPMETHOD_NONE
    * IPv6 network connections:     YES
    * IPv6 flow record support:     NO
    * IPFIX collection support:     YES (-L/usr/local/lib -lfixbuf -lpthread -lgthread-2.0 -pthread -lglib-2.0)
    * NetFlow9 collection support:  YES
    * sFlow collection support:     YES
    * Fixbuf compatibility:         libfixbuf-1.7.1 >= 1.6.0
    * Transport encryption support: NO (gnutls not found)
    * IPA support:                  NO
    * ZLIB support:                 YES (-lz)
    * LZO support:                  YES (-llzo2)
    * LIBPCAP support:              YES (-lpcap)
    * C-ARES support:               YES (-lcares)
    * ADNS support:                 NO
    * Python interpreter:           /usr/bin/python
    * Python support:               YES (-Wl,-z,relro -Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions -L/usr/lib -lz -ldl -lutil -lm -Wl,-z,relro -L/usr/lib/python2.7/config-x86_64-linux-gnu -lpython2.7 -pthread)
    * Python package destination:   /usr/lib/python2.7/dist-packages
    * Build analysis tools:         YES
    * Build packing tools:          YES
    * Compiler (CC):                gcc
    * Compiler flags (CFLAGS):      -I$(srcdir) -I$(top_builddir)/src/include -I$(top_srcdir)/src/include -DNDEBUG -D_ALL_SOURCE=1 -D_GNU_SOURCE=1 -Wall -W -Wmissing-prototypes -Wformat=2 -Wdeclaration-after-statement -Wpointer-arith -fno-strict-aliasing -O3
    * Linker flags (LDFLAGS):       
    * Libraries (LIBS):             -llzo2 -lz -ldl -lm

Depending on your requirements… you might actually want IPv6. Make sure you enable Python support. We will use that.

Let's get SiLK configured with ipt_netflow

There are two config files and they need corresponding entries.

Sample: /data/sensors.conf

probe local netflow-v9
 listen-on-port 2055
 protocol udp
 log-flags bad
end probe

group my-network
 ipblocks 1.2.3.4/32 
 ipblocks 192.168.100.0/24
end group

sensor local
 netflow-v9-probes local
 internal-ipblocks @my-network
 external-ipblocks remainder
end sensor

Oki, now for the corresponding silk.conf.

Sample: /data/silk.conf

version 2

sensor 0 local    "local"

class all
    sensors local
end class

class all
    type  0 in      in
    type  1 out     out
    type  2 inweb   iw
    type  3 outweb  ow
    type  4 innull  innull
    type  5 outnull outnull
    type  6 int2int int2int
    type  7 ext2ext ext2ext
    type  8 inicmp  inicmp
    type  9 outicmp outicmp
    type 10 other   other

    default-types in inweb inicmp
end class

default-class all

path-format "%N/%T/%Y/%m/%d/%x"

packing-logic "packlogic-twoway.so"

Vim,:set paste, copy & paste, done.

sudo /etc/init.d/rwflowpack start
Starting rwflowpack:    rwflowpack: Ignoring --archive-directory since no probes use directory polling
[OK]

Now let's query for the data from ipt_netflow

Now grab a coffee or pack your gym bag… because this takes some minutes. It’s flow data. You will figure out how important that is in just… 5-1234567 seconds.

Also… check netstat -tulpen for UDP:2055:
udp 0 0 0.0.0.0:2055 0.0.0.0:* 0 18856743 -

This means you can also pipe data from another host into this “Silk on a box”. I recommend to use different port for each device though. This way you can build profiles, and manage the sensors easily.

Netflow v9 please

sysctl net.netflow.protocol=9

Check.

Wait for it... wait...

/usr/local/bin/rwfilter --sensor=local --proto=0-255 --pass=stdout --type=all | rwcut | tail

   95.211.83.20|148.251.236.208|56923|51413|  6|        12|      2586|FS PA   |2016/09/25T13:37:12.900|    0.164|2016/09/25T13:37:13.064|local|
  93.169.25.233|148.251.236.208| 2962|51413| 17|         2|       258|        |2016/09/25T13:37:13.084|    0.000|2016/09/25T13:37:13.084|local|
105.103.127.116|148.251.236.208|18229|51413| 17|         2|       290|        |2016/09/25T13:37:13.004|    0.000|2016/09/25T13:37:13.004|local|
...

Now… sure. Grep, AWK, sed, Gnuplot and time for ASCII graphs and good old CSV files. Or not.It’s not the summer of 69. We have 2016.

Results so far

We have:

  • a high performance Netflow source. We can scale up it’s performance by adding CPU power to a VM
  • a high performance Netflow collector with state of the art analysis tools - commandline based though
  • the option to also use sFlow and IPFIX if we want
  • a method to generate real-time Netflow from mirrored traffic, also via an IPtables hack
  • a well-suited network analysis toolchain with 100s of functions

Wasn’t that worth 10 minutes?

FlowBat - a Web UI for SiLK

Wait a minute… there also is an automated SiLK on the box installer… I didn’t tell you. Wow, what a …

The reason is not, that I want to waste your time. The reason is that you will run into a problem or two; later. And therefore we only install FlowBat with the script. Not SiLK. Control is everything, man.

We have made sure that our SiLK stuff is functional already. This is a much better start point. While you execute that script, read some source code. Really. Now here’s a trick question for thy master: is localhost 0.0.0.0 or 127.0.0.1. It depends[tm]. Does it? What about opening http://$host:1800 now to get started?

You will realize that FlowBat isn’t like the FlowViewer. FlowViewer is a hacky set of Perl with many many many many many many many many many many bugs. And a 1990 style Web UI. FlowBat is modern, and a hacky set of Node.js.

Check out the graphs, and get convinced:

Pretty pictures… I am in! The reason why I bore you with all of this is, that this is awesome work.

  • You can setup FlowBat to collect the FlowRecords via SSH. It’s possible to include collectors, external to this installation, without having to link the systems via an API or anything complex.
  • You can export the records as a CSV and include them in your daily manual log processing. For example with Tableau
  • There are countless stats you can generate with FlowBat. Depending on your CIDRs etc. If you cannot build the query with rwfilter, chances are good that you are doing it wrong. Really.
  • the query builder is advanced. I don’t think that there are feature gaps.
  • FlowBat is fast. Faster then FlowViever if your know what I mean.

SilK to JSON for regular log crunching

Some networks are more important than others. For the network ranges you hold dear, I have a good way to keep an eye on them.

Following up my earlier blog, we can setup the Sumo Collector, file-forward the JSON output, and post-process the records in the cloud.

Technically this should look very similar to how we handle Suricata’s EVE.json, since the JSON search query operators remain the same. The aggregation can happen remotely. We do not have to do this manually.

Due to this, we can get metrical Netflow alerts based on throughput criteria. Such criteria can also be based on the options for anomaly detection, the Sumo service offers. Or on simple things like average amount of incoming bytes per minute, amount of connections etc… It’s on the network operators to define these, mostly. Based on the statistical analysis of the Netflow data, and experience values.

Here is what I based my workflow on.

#!/usr/bin/python2
from silk import *
import json
import datetime

def parse_all():
    ffile = '/tmp/test.rwf'
    flow = SilkFile(ffile,READ)
    d = {}

    for rec in flow:
        d['stime'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
        d['icmpcode'] = rec.icmpcode
        d['sip'] = str(rec.sip)
        d['protocol'] = rec.protocol
        d['output'] = rec.output
        d['packets'] = rec.packets
        d['bytes'] = rec.bytes
        d['application'] = rec.application
        d['sensor_id'] = rec.sensor_id
        d['etime'] = rec.etime.strftime("%Y-%m-%d %H:%M:%S")
        d['classtype_id'] = rec.classtype_id
        d['nhip'] = str(rec.nhip)
        d['input'] = rec.input
        d['icmptype'] = rec.icmptype
        d['dip'] = str(rec.dip)
        d['sport'] = rec.sport
        d['dport'] = rec.dport
        print json.dumps(d)
        # print "\n"

def main():
    parse_all()

if __name__ == "__main__":
    main()

Now you see why we need that Python support in SiLK. Here’s what you do for a cron job:

  1. run rwfilter, like this:/usr/local/bin/rwfilter --type=all --start-date=$(date -u +%Y/%m/%d:%H) --sensor=local --proto=0-255 --pass=stdout > /tmp/test.rwf. You can add a :%M:%S if you familiarize yourself with sTime and eTime. And the flow intervals. More on that later.

  2. run the Python code like this python netflow_json.py (because I hard-coded the path)

    /usr/local/bin/rwfilter --type=all --start-date=$(date -u +%Y/%m/%d:%H) --sensor=local --proto=0-255 --pass=stdout > /tmp/test.rwf
    /usr/bin/python /opt/scripts/netflow_json.py > /tmp/netflow.json
    rm /tmp/test.rwf

I run this once every hour for a start.You also need to do something about the last bucket of data, which isn’t fully collected yet. And… and… and… yes.

For the crontab:
58 * * * * /opt/scripts/netflow.sh >/dev/null 2>&1

SiLK to JSON to Sumo - for Netflow aided security monitoring and data correlation

  • The 1h timeframe is not optimal.
  • The date Bash hack with the minute in cron is ridiculous. What if it takes longer? Then there will be no records in the file for 1h.

I want to see if this data is as useful as Suricata’s flow in JSON. Suricata’s flow records have a flow id which I can match with an alert id. EVEbox works this way. This way I can quantify the network traffic associated with the security alert. In case of alerts, which are associated with Malware, this is useful information. It answers the usual question: can you associate a breach with this? To reply to such an analysis task, you can grab the Carbon Black netcon events and sum up the network traffic these caused with the infected client. The more data it is, the more likely… meh. :slight_smile:

This is just a PoC

Let’s pipe the JSON data into Sumo this way for now.

ToDo

  • create an RFC compliant timestamp for the Netflow JSON serializer
  • find a way to run this every minute (rwfilter queries need to be composed which take activity timeframes as well as sTime and eTime). A timestamp field should make sense for the data here.
  • compare if Suricata’s EVE.json flow records are more useful than the serialized SiLK records

The problem with Netflow here is, that this isn’t just kernel.capture.stats data. These flows mark directed communication. Therefore they don’t get reported with their final values at the time of the rwfilter query. If the connection is still active, the amount of bytes between source and destination still grows.

Results

  • We can setup a modern Netflow collector based on OpenSource tools
  • We can build an analysis VM which is able to generate Netflow data from mirror traffic
  • We can install a nice Netflow dashboard SiLK web interface
  • We can serialize Netflow data with Python, and send the data into a Log Management tool (WIP)
  • We can provide infrastructure to speed up Incident Response for network focused attacks

What I really want to do is grouping networks and hosts by activity using Hidden Markov Models. This should be a great application for Machine Learning.


#2

For reference: in case you want to run FlowBAT on supervisor

[program:FlowBat]
command=/bin/bash -c "/home/netflow/FlowBAT/bin/run"
directory=/home/netflow/FlowBAT
autorestart=true
environment=
        HOME=/home/netflow
stderr_logfile=syslog
stdout_logfile=syslog

Netflow data analysis with SiLK and Pandas
#3

I’m not sure if this makes the delay from gathering the date to getting it into your SIEM too long, but you could adjust the command to look something like this:

rwfilter --type=all --protocol=0-255 --start-date=$(date -u +%Y/%m/%d:%H -d "1 hour ago") --pass=stdout > /tmp/test.rwf

That would give you the ‘bucket’ for the previous hour. Running that at say minute 5 every hour would give you a full hours worth of data?

Cheers, Mike