Beats - ELK inputs simplified - for IT wide log management

Tags: #<Tag:0x00007f76fbb0e7e8> #<Tag:0x00007f76fbb0e518> #<Tag:0x00007f76fbb0e338>

ELK inputs simplified

Until now getting data into ELK was not simple. Common solutions like Splunk have Agents, which are easy to configure. The agents can be useful to centrally manage the log forwarding, and to apply the format and encoding individually.

Let’s take a hands on look at Elatic Beats, because it appears to be very promising:

  • get server network and network device flows in correlation (Packetbeats and Logstash Netflow v9)
  • get performance data from Windows and Linux systems
  • get Windows Eventlog and Linux Syslog with a slim agent (FIlebeat)
  • parse application specific data for statistical analysis (JSON for example)
  • handle different logging formats with multiple input streams (Syslog and JSON - one Agent)
  • perform log post processing on the server (GeoIP on webserver access logs for example)
  • visualize logs with Kibana 4
  • or do what you want with the data in Elasticsearch
  • Bonus: use standard log anaalsis tools

Filebeat

Filebeat takes every file input (from Rsyslog etc) and feeds it into Elasticsearch or Logstash.

It works for common Linux distributions (source and binary packages are available) and Windows. So it’s possible to forward application log files from Windows tools with Filebeat as well. These log files can be C:\foobar.log or what ever you like to have in ELK.

On Linux I can easily forward Syslog with Beats to ELK:

As you can see there is a Syslog message, there are the Syslog fields. The log message is received via “beats”. The magic is in between the Beats Agent and the Logstash config here:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # To fetch all ".log" files from a specific level of subdirectories
      # /var/log/*/*.log can be used.
      # For each file found under this path, a harvester is started.
      # Make sure not file is defined twice as this can lead to unexpected behaviour.
      paths:
        - /var/log/*.log
        - /var/log/messages
        #- c:\programdata\elasticsearch\logs\*

A prospector is a kind of file input / stream. You specify a path, and later on a codec (like UTF-8). The input_type is either log or stdin for now. The difference is that in case of log the position of the file needs to be remembered and log rotation events need to be handled.

# Type to be published in the 'type' field. For Elasticsearch output,
  # the type defines the document type these entries should be stored
  # in. Default: log
  document_type: syslog

Now lets tell the log server (Logstash / Elasticsearch) that the document type is syslog and the fields can get filtered:

[email protected]:/etc/logstash/conf.d# more 10-syslog.conf 
filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_progr
am}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

Fair enough, ain’t it?

Put “syslog” and it gets filtered like syslog. Once you have the fields you can do what you want with Kibana 4 and the visualizations.

So my JSON file...

A JSON formatted log file would simply be another prospected and you can tag it as the input of choice.

more 12-suricata.conf 
filter {

  if [type] == "SuricataIDPS" {
    date {
      match => [ "timestamp", "ISO8601" ]
    }
    ruby {
      code => "if event['event_type'] == 'fileinfo'; event['fileinfo']['type']=event['fileinfo']['magic'].to_s.split
(',')[0]; end;" 
    }
    json {
      source => "message"
    }

  }

In this snippet you can see that the field message (assuming this is JSON encoded), gets parsed and the fields get populated. The important point is, that the type is unique and fits the filter.

Filebeat will queue up log entries, and it’s also able to apply TLS etc. The queue’ing is important to avoid log gaps in case of Elasticsearch maintenance. In usual cases you might not want to host a high-available cluster if you can avoid it. Log queues can be a headache.

Packetbeats

Packetbeats is like Wireshark (it’s a sniffer), and it stores the captured and dissected packet infos (from libpcap) in ELK. You can define filters for certain kinds of traffic, like MSSQL, MySQL, PostgresSQL and filter by IP range in KIbana (SQL to extern for example).
I would like to specify an IP range within a Packetbeat Agent, to only log DBMS traffic which leaves the internal network / perimeter.

This can be very useful for network forensics and / or (VPC) performance analysis. The amount of data can be high, but you can use curator for index retention management.

Topbeats

Topbeats serializes procfs (in case of Linux) into ELK for performance stats.It does something similar for Windows as well.

This way your ELK stack is not only a network analysis and log analysis system, it’s also capable to perform performance and bottleneck analysis.

Winlogbeats

Of course you also need Windows Eventlogs. Winlogbeat is specifically for Windows Eventlogs and forwards Eventlogs for you - with a sane setup. In opposite to NXlog it works perfectly well (my experience).

You can get GPO logs, Windows Security Events etc. The filters are straight forward and easy to manage:

Netflow

Logstash supports Netflow v9 and v5. So you can analyze server network issues and network device flows in conjunction with each other now.

This is something I recommend for any versed operations team, because having such a debugging tool at practically zero cost is brilliant. Especially when you face modern issues, which are in between load-balanced and distributed systems.

Modern devices might only support IPfix or sFlow (like F5 bigIP boxes). I recommend using SiLK’s rwflowpack for this. There is a table you can check out. You can serialize rwfilter into ELK. It’s a bit hacky at the moment, but I am sure there will be a simpler way soonish[tm]

Bonus: it's just HTTP and JSON

Searching can be done with standard core utils from the command-line. You can remain using AWK, Sed, and of course grep.

Just curl the data (example here):

% # easypeasy:
curl 'http://elk:9200/filebeat-*/_search?q=*:*' | jq '.hits.hits[]._source.message' | egrep sshd

Of course you can get the JSON fields extracted, individual per user and “grep” for your funky JSON keys directly with jq. The whole point of ELK is to give you a unified data access across all (IT) domains.