What is syslog?
- rfc5424 - https://tools.ietf.org/html/rfc5424 - also known as “new syslog”
- rfc3164 - https://www.ietf.org/rfc/rfc3164.txt
So what is this syslog standard everyone refers to as a transportation allowance to submit garbage to a log system. Can’t we use use JSON and HTTP? In 2016?
Fluentd doesn't get syslog
Unless you use the new-syslog plugin. If you use this you don’t need to meddle with Rsyslog templates, or even the semi-proprietary syslog-ng. You can setup a local receiver for appliances like network devices or security devices which do not provide anything else. However the rule of thumb for a sane logging environment is, that if there is something like a structured logging format, use it. If there is JSON, use JSON. If there is LEEF, use LEEF. If there is CSV… you get it.
Configuring the Syslog source in Fluentd is simple:
<source> @type newsyslog port 50514 bind 0.0.0.0 tag syslog </source>
It’s very easy to commit the syslog events into elasticsearch:
<match syslog.**> @type elasticsearch host 127.0.0.1 port 9200 logstash_format true buffer_type memory flush_interval 60 retry_limit 15 retry_wait 1.0 num_threads 2 </match>
Message post-processing and enrichment
One of the strong points of fluentd is that you can apply formats on the messages, like JSON, Key-Value, CSV etc. It’s possible to transform records with simple Ruby scripts on the fly. Fluentd uses JSON as its transportation standard. So you can use the JSON record transformer and enrich or post-process logs.
Two things to take away are:
- handling syslog with a modern log stack, which is comprised of Elasticsearch and FluentD requires a plugin. This is entirely obvious if you just read the fluentd documentation.
- At least run a parser over the syslog message, if you have to use syslog. Otherwise you do not get fields populated into the Elasticsearch index. If you don’t do that you cannot utilize Elasticsearch queries for log analysis efficiently. And you cannot easily apply compression to store the index data. It’s very essential that you don’t feed random garbage into Elasticsearch if you want to have security or business intelligence based on logs.
A common mistake is, that companies migrate from Splunk to ELK, without taking care of structuring the log messages with appropriate parsers. At first this sounds like something you can do later, and a skilled admin with some understanding of regular expressions could do without it. That is not the case, because Elasticsearch without a proper index is like a giant blob. And blobs don’t produce insights.