Handle Syslog with fluentd - real syslog

log-parsers
fluentd
elasticsearch
Tags: #<Tag:0x00007f389ecea9e0> #<Tag:0x00007f389ecea760> #<Tag:0x00007f389ecea4e0>

#1

What is syslog?

Good question.

So what is this syslog standard everyone refers to as a transportation allowance to submit garbage to a log system. Can’t we use use JSON and HTTP? In 2016?

Fluentd doesn't get syslog

Unless you use the new-syslog plugin. If you use this you don’t need to meddle with Rsyslog templates, or even the semi-proprietary syslog-ng. You can setup a local receiver for appliances like network devices or security devices which do not provide anything else. However the rule of thumb for a sane logging environment is, that if there is something like a structured logging format, use it. If there is JSON, use JSON. If there is LEEF, use LEEF. If there is CSV… you get it.

Configuring the Syslog source in Fluentd is simple:

<source>
  @type newsyslog
  port 50514
  bind 0.0.0.0
  tag syslog
</source>

It’s very easy to commit the syslog events into elasticsearch:

<match syslog.**>
  @type elasticsearch
  host 127.0.0.1
  port 9200
  logstash_format true
  buffer_type memory
  flush_interval 60
  retry_limit 15
  retry_wait 1.0
  num_threads 2
</match>

Message post-processing and enrichment

One of the strong points of fluentd is that you can apply formats on the messages, like JSON, Key-Value, CSV etc. It’s possible to transform records with simple Ruby scripts on the fly. Fluentd uses JSON as its transportation standard. So you can use the JSON record transformer and enrich or post-process logs.

Results

Two things to take away are:

  1. handling syslog with a modern log stack, which is comprised of Elasticsearch and FluentD requires a plugin. This is entirely obvious if you just read the fluentd documentation.
  2. At least run a parser over the syslog message, if you have to use syslog. Otherwise you do not get fields populated into the Elasticsearch index. If you don’t do that you cannot utilize Elasticsearch queries for log analysis efficiently. And you cannot easily apply compression to store the index data. It’s very essential that you don’t feed random garbage into Elasticsearch if you want to have security or business intelligence based on logs.

A common mistake is, that companies migrate from Splunk to ELK, without taking care of structuring the log messages with appropriate parsers. At first this sounds like something you can do later, and a skilled admin with some understanding of regular expressions could do without it. That is not the case, because Elasticsearch without a proper index is like a giant blob. And blobs don’t produce insights.