Darktrace - Machine Learning Network Intrusion Detection System

qradar
dlp
intrusion-detection
siem
network_forensics
machine-learning
Tags: #<Tag:0x00007f38a11a0eb0> #<Tag:0x00007f38a11a0c80> #<Tag:0x00007f38a11a0a50> #<Tag:0x00007f38a11a0848> #<Tag:0x00007f38a11a0640> #<Tag:0x00007f38a11a0438>

#1

Please note that this article is neither sponsored, nor part of any marketing campaign. I don’t get a benefit, although some aspects relate to industry products.


Looks can be...

Darktrace is not a usual Network Intrusion Detection System. A while ago I wrote an in-depth technical wiki post about Suricata, which essentially is like Snort / Sourcefire.

Darktrace and Suri have in common, that they focus on network traffic analysis, and they do not decrypt SSL streams. Systems like Forcepoint (formerly known as WebSense) inspect SSL-encrypted connections, which often is referred to as SSL-interception via a PKI. Such a feature is common for Data Leakage Prevention, because for this kind of security system it’s necessary to detect the fingerprints and patterns in an (almost) gap-less fashion.

All of the mentioned systems, Forcepoint, Suricata, Darktrace… have in common that they are Deep Packet Inspection (DPI) systems. But as we all know: they deeper we dive, the more we see. Looks can be… relevant. And Darktrace has something to offer, which looks cinematic.

The question is: is this look deceiving? Or is Darktrace useful for real-world threat-hunting and Incident Response? Is the Machine Learning part of the snake oil package? Or is this practical?

What do we see here?

Darktrace does two things with the network ingest from a TAP or SPAN port:

  • similarity clustering and
  • anomaly monitoring.

In order to evaluate the product we have to keep this in mind. Machine Learning (ML) is not magic. It’s mathematics, with a certain emphasis on statistical models. - So we are hunting “cyber” criminals with math.

The posted screenshot shows how the start pages of the Web UI look. 3d, WebGL: information is filtered down and some segments are blurred out, because this is from a hands-on evaluation in a real network.

What we see in the visualization screenshot is one device (my personal laptop) inside a monitored network. The laptop called plug here.
It had a bunch of IP addresses over a period of time (blurred, but the history is tracked via the MAC address) and it connects to devices such as pipe.skype.com etc… Not unusual in this network. Or in general.
At the bottom of the screenshot you see, that there are some blurred out anomalies, which can be investigated. Alerts to work through.

But it goes one step further. I can list similar connection patterns. Here is an example of the similarity feature: Darktrace can list similar devices. I haven’t come across that feature in any other Network IDS yet: who else is doing similar “things”?

image

This is a network timeline analysis over bidirectional network throughout on port 80 to similar ASNs, like Google, Amazon etc. This is a very interesting way to cluster activities in a network, given the centrality of these internet services today.

A couple of devices are grouped to show that they did roughly the same thing at the same time: download the same information, updated some software, did some web research.

Verbosity - can you understand what you see?

Many Network IDS systems fall short, when you want raw information. Like packet captures in the PCAP format. First, let’s take a step back from the fancy UI and read some text. Do we want to investigate this in depth?

image

So what happened?

  • plug connected to the RSS feeds of this web site :wink: Good boy!
  • it connected to 172.97.57.85, which is owned by Bungie Inc. Sine we also see Battle.net connections, chances are good someone played Destiny 2. This also explains deadorbit.net - which is a wiki for the game.
  • Apparently Outlook was connecting to Office 365 in the background, and Skype as well as Microsoft Teams
  • Something checked ClamAV signatures. So OpenSource AV is a thing?

Now I’d like to see what was sent to that 172.97.X.X IP. Can I get a PCAP?

Yes, you can download a (filtered) PCAP of the event into Wireshark, and SSL / TLS - decrypt it with a cert if you have it. You can also inspect it with the Web UI. At least to some degree.

But there are no signatures

So, there are no signatures? No RegEx? If you know the Emerging Threat Professional / Open rules, you know that the core idea is to define patterns. With Darktrace you have models instead.

But where is the SQL Injection Model?

Uh… well here are the SQLi rules for Snort / Suri. Obviously these don’t work with DT.

Does that mean that Darktrace aims to detect an SQL injection attack by detecting download anomalies during a data breach? - Not good. Last time I checked attackers don’t try to download everything at once in one big chunk, causing a spiky anomaly. They exfiltrate data slowly, via multiple covert pivot points. To be redundant and to make is harder to investigate the attack and to pull the plug on it.

I don’t say that it’s impossible to detect SQL Injection attacks with Darktrace. You can define a model, and you can feed Darktrace SSL-offloaded traffic, for example from a F5 ClonePool. But in many cases you probably want a Web App Firewall, like mod_security for this anyways. Given that your web front has access to the clear-text headers… You know the drill.

So no. This is not a typical IDS. It’s not a silver bullet solution. All it does is some Machine Learning for similarity classification and anomaly detection. It’s not a “perimeter” device, which you feed with Ingress / Egress traffic from the edge routers, because for most of the external endpoints Darktrace cannot build models anyways due to the lack of regularity.

How does Darktrace compare to other systems?

Darktrace does more than the Forcepoint Network Agent (this is relatively new), which simply quantifies protocol usage by connection from a SPAN input. Forcepoint’s system is similar to Netflow because it improves visibility, but this can get correlated into the Triton Web DLP solution. That is nice, but at the end of the day it’s not an IDS at all.
If you have the hardware appliances you can also mirror the SSL-offloaded traffic to an IDS appliance. But I am not a great fan of that, given that this is invasive. Darktrace does not need this in order to work efficiently.

Darktrace does something different than Suricata / Snort / Sourcefire, because these NIDS systems are focused on DMZ and perimeter security, for production environments. These IDS systems should get the traffic from the edge routers, or Load Balancers. They are useful to investigate DDoS attacks, or to inspect HTTP(s) traffic for the mentioned web attacks, like SQL injection or other attacks from the OWASP top 10. Darktrace does not focus on this.

Darktrace - Alert quality

You have to resolve the alerts and interact with the them to teach the underlying ML models what is normal; and what is not. At first the majority of the alerts might relate to new devices, which appear in the Guest WiFi or sudden changes in throughput in the backup network.

Later on the alerts are different than what traditional IDS deliver. Instead of having to deal with False Positives from signature mismatches you are going to deal with Model exceptions.

I think over time you might have an easier workflow with the Models, but I have only tested the system for a couple of weeks so far. To answer the initial question: I think that the ML aspects are well integrated, and that we are not looking at snakeoil wrapped in WebGL. :wink:

Darktrace - Logs and SIEMs

Many companies have Network IDS systems, which are forgotten boxes in the co-location and office server room racks with blinky lights on. Often the alerts are fed into a SIEM solution, like IBM QRadar, to make some sense of them. But in reality the credibility and magnitude of the IDS alerts gets lowered. And it often doesn’t contribute to the overall security posture. This is a real problem in the security industry: we have boxes. But they don’t do enough.

Darktrace has a (LEEF, CEF and JSON) Syslog output, which can be used. Due to this the solution can be integrated in most security stacks. For QRadar you’d simply write a quick LSX.

DT comes with its own ELK based advanced search, and the logs are structured into fields. It’s not a Elasticsearch garbage bin.

image

One of the key features for the log analysis is, that the logs contain the amount of bytes. That means you can prioritize security alerts in relation to the amount of sent / received data to ascertain for a breach score.

Example: What is the security analysis workflow for Incident Response and Threat Hunting?

The usual workflow I have is based on IBM QRadar, where the possible offenses are sorted by magnitude / potential impact. The logs from multiple systems are inspected in conjunction with each other. E.g. Active Directory with Darktrace with Firewall logs, with Netflow… get correlated. Historically, cyclically etc… Matched against Threat Feeds. You know the drill.

In short: Darktrace can be part of a semi-automated SIEM workflow that compiles down the security alerts to a big-picture event management overview. But it is not a silver bullet and no standalone product.

Sure, this result list doesn’t look as interesting as the WebGL 3d rendering. But it gets the job done.

Also, I believe the UI ergonomics of DT matter, because it’s more likely that the analyst opens up the IDS :wink:

Summary

In three points:

  • Darktrace is a network anomaly and similarity detection system. It’s focused on internal networks, and not on DMZ and perimeter systems.
  • It has got industry-standard interfaces to integrate with SIEMs, such as IBM QRadar, ArcSight etc, or with logging solutions like Splunk, ELK, Sumo Logic etc.
  • Darktrace is a good fit to analyze possible data-breaches because it enriches the alerts with meta-data and connection metrics. This is very important, and a key feature for the analyst to assign a magnitude to the possible incident.

It’s not about the looks. It’s about being able to take a look. Darktrace can save some time for teams, which are actively hunting threats within the networks.
I have no plans to test the “AntiGena” prevention mode.

Changelog

03.11.2017 - posted the article
04.11.2017 - corrected many errors and realigned some phrases to clarify
15.11.2017 - corrected some typos, fixed some sentences while I was checking for CSS bugs on the site