- Are people coming into the infosec office and 'miring at your dashboards?
- Do your executives reply to security reports with “one more!”?
- Are sysadmins asking if you need HA storage for your systems, just because they want to rack it in?
- Are developers asking if they can develop mobile alerting apps for your team, because it would be fun?
No? Why not?
So I will just collect syslog and...
… all these other standards… and… fail. Few commercially viable security tools use standard (IETF) syslog. Instead they often use the syslog port (it’s syslog, see?!) and send some ASCII data over TCP. Usually the logs are unstructured and make (some) sense for an engineer. But having a full-time job reading logs is a quick way into the “it’s just syslog sanitarium”. It’s that place where you meet the full-time RegEx engineer, Grok programmers and MongoDB DBAs. These people.
So you cannot read logs? I mean in real-time? - That is the reason why logs need to be searchable. But search engines aren’t exactly simple. There are 100s of tools out there which take logs, store them in a database, and export some query operations.
Sadly none of them work very well. Take a quick look at the DSL for Elasticsearch for nested queries (which you would need for correlation). Does this look like something I want to type during an incident response on a commandline using Vim and curl? Does Kibana (the most common Elasticsearch front) support this? No? Why not?
Or have you taken a look at Graylog2’s search engine? Does this look like something with better capabilities than a 5 year old, trying to get into this reading thing? No? Why not?
If it’s like that I am much faster using a central Rsyslog server. Grep, Awk and sed on files are much better suited for log analysis than the frontends for Elasticsearch (and I count Graylog2 among these). I am not saying that ELK / Graylog2 are useless, but for analysis they actually are. Just because there are some pretty charts here and there that doesn’t mean the information can be extracted. If you cannot define the search, why do you fill the disks with data at all? No one is going to develop the searches for you. You need to do that. Full time. And join the “it’s just syslog sanitarium” as a full time Elasticsearch DSL programmer.
Wait a minute, Splunk!
I know, right? Easy. All these security nutjobs swear on it. SOCs are staffed around Splunk skills. Skills,you know. The most valuable thing to have. Product skills with ZERO domain expertise. Splunk. The way Splunk (the company) operates these days can be summed up by this video.
It’s sales, sales and sales. And not much more. In the Gardner Magic Quadrant IBM QRadar is shown as the leader. The future of Splunk (sadly) can be summed up if you print out the Gardner article, and draw a big red line from the Splunk dot to the bottom left.
The Enterprise Security (ES) add on for Splunk only works if 99,9 % of the conditions are met. - In opposite to other more flexible solutions. Like Rsyslog Or Grep.
Splunk (without ES) is not really a SIEM. It’s a search engine for textual data, which can pipe the results. Either towards deeper searches or interactive visualizations. ES has the common functions like asset management, log normalization; and of course correlation. But who cares about that? Only people with domain expertise and these people cannot be hired, because the market aeh… because HR… doesn’t have them.
Who is this correlation?
Essentially correlation is a buzzword used to describe an analysis task, which is supposed to make sense of multiple factors. These factors can be data inputs (log sources) or searches (multi conditional). Usually you want to make assertions on the results, and conclude with severity indicators to rank results. In order to do this logs need to be normalized and… more importantly… enriched.
This enrichment can be simple if you only need (reverse) DNS lookups. But if you need aggregation and sliding statistical indicators, data model based field supplementation etc. it can be more complex than theoretical physics. And while we are at it, robust statistical models for local or global anomaly detection aren’t even covered by the standard curriculum of applied Computer Science. Welcome to the real world! What? You just hired “a guy” with 2 years of “security expertise”? Can he do that? No? Why not?
Correlation for SIEMs is supposed to give you a top list of issues. For example if one asset has multiple IDS, endpoint security, DLP and event log anomalies you might want to start calling the anomaly an alert. But the usual “Bad IP” or “TOR IP” IDS alert doesn’t qualify to get the sirens in the SOC started. Unless it’s accompanied by some sort of throughput or latency variation, which can be spotted by the SIEM via Netflow or Network IDS integration. Uh wait… wasn’t that exactly that position in the offer, which had the zeros before the dot or the comma? Damn you people with domain expertise. I hate your requirements!
So I need this correlation to rank my observations. Why if I don't have all data in one solution
This is where is gets tricky. From an engineering perspective custom integration isn’t alien to the job description. “I can do this, just give me an SDK and I will commit code. Man, I have 10 years of coding experience…” Good for you. Relevance: ZERO.
Because from a sales perspective this doesn’t make any sense: instead of a cheap integration they sell an additional product (piece). So the SIEM business model is making sure that the customer boils in a pot of complexity, so that he comes comes back and buys XYZ. Which can be more “Events Per Second” or “Add On X”. If you develop your custom SDK based SIEM add on, sadly, sadly, sadly, due to warranty reasons… you know the drill. So sorry, the support case problem in the Salesforce ticketing system could be related to your own customization. Why? Why not?
You get very interesting discussions if you have pretty graphs in Q1, but in Q2 a month of data is missing. You might really need the supported add on. Trust me.
A conflict arises: for AD DC logging the infrastructure team want tool A, but the security team has tool B. Let’s solve this:
Battle of the nerds? - No.
Chess. Kidding!? Who knows the rules?
Okay, I have the solution.
My admins cannot handle the SIEM search
Too bad for the admins. Seriously.
Years ago I did an SAP training and I needed caffeine pills to stay awake because I didn’t like the coffee there. Coffee is an art, not a drink!
So it’s a bag of cheap pills (or bad coffee) and a training for the admins. That is the reason why there are health checks in the first place as part of the on-boarding procedure for new employees. You didn’t know?
In the real world it simply happens that you log twice. Because logs are too important to lose them. Are they?
Let’s say you have a 12 month retention policy and you get to 12 TB of data in your big-data style SIEM. The inputs are NIDS, Network devices, Endpoint Security, AD, DHCP, DNS, Netflow / IPFIX, custom stuff and Boom DLP. Now Boom DLP explodes in your face after a signature update and somehow 2 million alerts flood the SIEM. 500 GB: Every now and then Boom has an issue like that, because… DLP is fun.
The day of days comes: the quarterly security report everyone has been waiting for. Once every 3 months infosec department becomes visible and relevant. Whohoo, suit up and deliver! Black suits, black car… black backgrounds on the charts, yo!
Sadly there are “Data Leakage Events” (sounds great, doesn’t it!) with no relation to actual security events (WTF), because they are false positives (“ehe, bullshit security”) due to data indexing and signature issues (Say what?). Simply because people (these people HR hires!) don’t want to sign emails and documents on their phones. And no one attends these security awareness trainings with caffeine pills. People attend them with their mobile phones, searching for lifting advice on their squatting form in the gym. Or they just decide to install an app which pushes the company mail to a social network platform. Cloud mobile p2p-based file sync workflow applications are very trendy!
And that’s when DLP gets hard: bad behavior and no awareness. Now you go to the meeting of meetings on the day of days and tell your executives of executives that all these data loss issues are false positives (because p2p cloud workflow tools, you know), because these expensive Boom security appliances are actually snakeoil. People do what they want in this company and infosec is not in charge. That is the problem. No? Why not?
Of course - great career advise :). Translated it means that your department wastes a large percentage of what is called EBITDA (I always have to look up what this means as well) and hmmh… wasting net revenue, making things expensive. Congratulations, you are where no one wants to be. (Fired?)
Well, I could have told you that it’s garbage in - garbage out. But you didn’t want to listen. Or maybe you did and you logged EVERYTHING. Who told you that? Sales?
Garbage in - charts out - because all the things
Back to the good old toddler days. Wasn’t it a shame that mummy took away the colorful toys to replace them with school books?
In the real world what happens simply is mathematical obfuscation with aesthetics… aeh daa science workflows get applied. You have this data science wizard guy who also writes the business report (trusted fellow, yo). He likes these expensive wines (we would never ever…) and he can do stuff with data.
Toddlers aehh… managers with no domain insight… like it. Because it says everything is fine in nice colors. All the things say that. That is called credibility. A holistic security perspective!
Why not align the entire business process with metrics like that, so that you get a lot of charts, which say everything is fine, but “we need to improve”. Isn’t that called COBIT? Oh wait, no… COBIT was about control objectives. Not about objectives. My bad. It really is.
Let's get back to email
Last time I wanted to index and search the Outlook inbox, I became frustrated. It’s garbage in, but it never comes out again.
So chances are good no one else will (be able to) verify results if you dump alerts, logs etc. into mails, and “count” them. It’s not exactly lying, because you “don’t have the tools” to make better reports. And map-reduce - we all know that - is just best effort. It’s not about the exact results, it’s about getting them fast.
So one day before the day of days, when you meet the executives of executives, you get the intern of interns. To do the slides. Plausible deniability at zero cost. Finally the information security department delivers value.
To an inbox! Let’s hire that intern as our Junior Associate Grok DSL developer. - Because we need better tools.