Analyse EventLog, Syslog and Suricata's eve.json with Sumo Logic

log_management
saas
netflow
log-analysis
Tags: #<Tag:0x00007f0ca73ac900> #<Tag:0x00007f0ca73ac6a8> #<Tag:0x00007f0ca73ac4f0> #<Tag:0x00007f0ca73ac388>

#1

Just as powerful as Splunk, it is...

… or even more powerful. You can check it out: Sumo Logic.

Sumo Logic is a service (SaaS), which accepts machine data and exposes a very powerful search query language to the user. It can be used for log analysis and metrical data analysis. So on top of log processing features you get performance monitoring features, such as CPU utilization, memory consumption and network throughput. For example.

I am exemplifying how to use Sumo Logic:

  • with a large JSON file. I will show how to parse and aggregate the Flow data. This way Suricata acts as your Netflow processor, which sends serialized JSON flow data to Sumo Logic. I will show you some visualizations, which you can generate (even with the free account). This way you can generate a network overview, and identify network issues. The EVE JSON standard of Suricata contains a vast amount of information.
  • to analyse Windows EventLogs to parse for VPN activity
  • to parse Apache access logs and to visualize them
  • and how to handle Linux Syslog (Rsyslog) to count root logins via PAM

How to get the data in...

Many of these modern solutions fail here. Sumo does not.

  • They have Java based Agents, which can open a local Syslog listener on a dedicated machine.
  • They also support remote Windows EventLog sources. as well as local EventLog files.
  • The agent can retrieve files from the local disk, and via SSH.

I am posting the search queries here, because there is a project on GitHub which allows you to utilize these for your own independent data analysis. A good workflow is to pipe logs via plaso / log2timeline to sumo (the Go binary built from the source code) to d3.js or InfoVis toolkit.Or Tableau, if you need it fast. In some cases I have also used IBM SPSS, but not for InfoSec purposes. For large scale tasks I recommend to use Theano and/or Tensorflow, and probably (py)Spark. However in the following I want to focus on the Sumo Logic service.

The service has a decent API, as you might have guessed. Otherwise I would not write about it. Their Python API is very similar to the Splunk Python SDK imho. This becomes important for data enrichment tasks, and automation. Logs need to contribute to the security processes and the overall security posture, because they contain relevant “trigger” information.

How do I work with Sumo

The agent installation is simple on Windows and Linux, and it gets configured via the Web UI. Now the data is in Sumo and it’s a Splunk style experience.

Work with JSON data

JSON is an important logging standard, and the most common data exchange format modern tools use nowadays. I have configured the Sumo Collector Agent to monitor a file and to send the data into their service. Here is what I get.

So what now? Fields aren’t extracted. Do I just regex parse the data? Ehhhh… that could become a big task. 100s of fields, complex stacking. No!

Let’s use the json support:

(_source="Suricata NIDS EVE JSON") | json auto extractarrays nodrop

We select the source, pipe it to the json command, and extract the whole multi layered data set without dropping invalid entries.

Now you can see all the fields, which allows fast-forward statistical analysis on the value distribution.

We can aggregate the flow data:

(_source="Suricata NIDS EVE JSON") | json auto extractarrays nodrop | sum(%flow.pkts_toserver) group by src_ip | sort by _sum

We use one of these extracted fields, and point into the array within the JSON record. We group it by source IP. The sum function takes the amount of packets, which is indicated within the flow.pkts_toserver field. This is an integer value. The percent sign is used to denote a dot within the JSON hierarchy. Took me some time to figure it out.

This will sum the packets up, within the selected time frame. Let’s put this into a doughnut:

Not all records in the JSON file have a flow.pkts_toserver field. Due to this not all entries from the source can be taken into the aggregation. That is the reason for the accuracy warning.

We can also aggregate our data per time slice. This is important if we want to plot the throughput.

(_source="Suricata NIDS EVE JSON") | json auto extractarrays nodrop | timeslice by 1m | sum (%flow.bytes_toserver) group _timeslice

Practically this is the same. However we group the flow.bytes_toserver per times slice. This is 1m, one minute. Let’s make a bar chart for every minute.

And there we have our 1st anomaly already. :slight_smile: What was that traffic spike in that minute? Time to find out… use the other Suricata EVE.json event types. You can also make a line chart or something like that. And overlay multiple lines, for the top talkers, to denote a relative dimension. Or group them by the top source IPs. Go for it.

Count here, sort there

(_source="Suricata NIDS EVE JSON")
AND "event_type\":\"alert\""
| json auto nodrop | count by %alert.signature

This is a count of ET Open IDS signatures for 24h, sorted by the most recent occurrence, and counted distinctly.

Pretty straight forward I’d say.

Linux Syslog

I put the collector in a dedicated VM, configured it to open a TCP port, and told Rsyslog to TCP forward all incoming Syslog messages from 514 to 1514. Then I had the Linux system daemon’s logged centrally, from multiple systems.

Now let’s check how many root logins we have seen in the last 24h.

(pam
AND "user root")
AND !CRON | count by _sourceHost

webdeb is a KVM guest… with an Apache. Time to analyze access logs.

Access logs via SSH

I created a Debian user logger and put it into the adm group, so that it can read /var/log/apache2/*.log. The Sumo Collector Agent logs in with an SSH key, grabs the logs lines, and adds them to the records. There is a Sumo App for Apache Access Logs, so this is the simplest analysis you can perform with Sumo.

_sourceCategory=Apache 
| parse regex "^(?<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" nodrop
| parse regex "(?<method>[A-Z]+)\s(?<url>\S+)\sHTTP/[\d\.]+\"\s(?<status_code>\d+)\s(?<size>[\d-]+)" nodrop
| parse regex "(?<method>[A-Z]+)\s(?<url>\S+)\sHTTP/[\d\.]+\"\s(?<status_code>\d+)\s(?<size>[\d-]+)\s\"(?<referrer>.*?)\"\s\"(?<user_agent>.+?)\".*" nodrop
| if(status_code matches "2*", 1, 0) as successes 
| if(status_code matches "3*", 1, 0) as redirects 
| if(status_code matches "4*", 1, 0) as client_errors 
| if(status_code matches "5*", 1, 0) as server_errors 
| timeslice by 1h 
| sum(successes) as successes, sum(client_errors) as client_errors,  sum(redirects) as redirects, sum(server_errors) as server_errors by _timeslice
| sort by _timeslice asc

The reason why I show this here is, that it is an example of regex usage and conditional logic. Graylog2 does not have this. Neither does Kibana 4, unless you are very crafty with Elasticsearch DSL. Splunk can do this, but it’s more complex.

Take a look at the error codes stacked in a bar chart over time within a nice stylish dashboard, which has all the other visualizations as well:

So… what about EventLogs. The VPN activity tab there… I can see it.

EventLog analysis - next next finish

(VPN)
| parse using public/windows/2008 | timeslice 24h | count by msg_summary | sort by _count

Yes, you can just parse EventLog with parse using.

LogReduce and LogCompare

Sumo has a focus on analysis functions. Such as LogReduce. This means that you can use their ML based tools to limit the amount of logs you have to read. You will get the major differences instead, and you can apply a rating. This is useful for System Administrators mostly.

Ok, where is the catch?

I haven’t seen:

  • 2 Factor Authentication - they only allow you to limit the IP ranged of allowed login IP ranges (edit: with SAML you can use a service to do that, but it’s not directly integrated)
  • they support SAML, but not ADFS (is that the same thing?) (edit: it is)
  • how well the Sumo Collector Agent works with massive Syslog ingest (edit: that is documented)
  • whether the mail alerts are reliable, but since it has got an API chances are good that a script within a cron job can apply some decent evaluation and data enrichment, before the validated alert gets forwarded to Pagerduty or something similar.
  • I have seen slow searches!

Results

If the performance of this service gets better with a paid subscription, it can be practical for some gigabytes of logs per day; regardless of the format as I have shown.

The fact that Sumo also processes performance metrics is more than decent. Especially for an NIDS like Suricata. Think of the added value if you stream the Linux kernel procfs perf data as well. Spotting issues can become a one second job this way.

Sumo’s visualizations are state of the art. The query language is very powerful. This makes the biggest differance. Projects like Graylog2 and Kibana 4 suffer from missing features, such as conditional logic, mathematical expressions and comparison operators. Graylog2 doesn’t even have support for regular expressions, due to Elasticsearch. Sumo has all of these features.

The service quality seems to be fairly high, and from what I see they process a lot of data in AWS. So chances are actually good that they can boost up performance for paying customers. They have a lot of apps which can make it easy to understand the search query language hands on. For example geo-tagging fields in Apache logs is a 1 minute endeavor. They also have security analytics functions, which look like they can compete with Splunk Enterprise Security. I haven’t been able to test this yet, but maybe I will.


Netflow data analysis with SiLK and Pandas