DNS Intrusion Detection using Dnsflow

In the DNS Intrusion Detection in Office 365 post we introduced strategies implemented in Office 365 to detect anomalous DNS activity. Dnsflow is one of those strategies and involves aggregating DNS data processed by the DNS servers in Office 365. In this post, we discuss detailed benefits, challenges and implementation of Dnsflow. We also discuss how Dnsflow enables us to easily build detections that inform us about the potential abuse of DNS in our network.

Benefits

  • Access to complete DNS data
    Dnsflow runs on DNS servers, which are the choke point of DNS traffic in our network. As a result, Dnsflow has access to all DNS queries and responses.
  • Access to Nameserver IP address
    Nameserver IP address is critical forensic information for incident response. DNS servers make outgoing DNS queries to the nameserver and Dnsflow collects this information.
  • High performance and low impact monitoring
    Dnsflow uses Enhanced DNS Logging feature of Windows Sever 2012 R2 via the Microsoft-Windows-DNSServer ETW trace, which provides very low impact monitoring.
  • Ability to detect anomalies in near real time
    Since Dnsflow is aggregated DNS data for detecting anomalies, very little off the box processing is needed and the data is ready to be consumed in the detections at the end of each aggregation interval.

Challenges, Risks, and Limitations

  • DNS servers process very high volume of DNS data. While, the data source is low impact, the processing and aggregation done by Dnsflow needs to be highly efficient as well.
  • There is no visibility into the process that initiated the DNS query on the remote client.
  • Nameserver IP address accuracy depends on network topology and configuration of DNS server.
  • Since DNS Servers are the central point for monitoring DNS data in the network, they also become central point of failure and downtimes may result in large loss of monitoring data.

Implementation

Accessing raw DNS data

We use the analytic events of the Microsoft-Windows-DNSServer ETW trace as the raw data source for Dnsflow data. It includes full DNS query and response data. We subscribe to the following events –

  • 256: Query received
  • 257: Response success
  • 261: Recurse response in

 

dnsflow_1

We use the krabsetw library in our monitoring to consume the ETW events. Our agent receives the events for each DNS lookup upon configuring the library for the above events.

Addressing the challenges, risks and limitations

  • Efficient data handling with fast-filtering
    Dnsflow filters out very high-volume domains belonging to trusted internet services very early in the monitoring pipeline. Most known DNS intrusion exploits require an attacker to maintain control of the domain's nameserver for an extended time. Hijacking and using these domains is not a viable approach for the attackers. So, fast-filtering introduces minimal risk and significantly improves data handling.
  • Process attribution
    Results from detections using the Dnsflow data are joined with the Process and Netflow monitoring data from our servers to attribute a DNS lookup to a process. See How to Make Results Actionable section of DNS Intrusion Detection in Office 365.
  • Identifying nameserver for a domain
    Dnsflow associates the IP address of the last recursive query as the nameserver for the domain being resolved.
  • Preventing and mitigating gaps in monitoring data
    The other two detection approaches discussed in the post DNS Intrusion Detection in Office 365 have some overlap with Dnsflow. They help in covering the gaps in case of loss of Dnsflow data.

Dnsflow data sets

Transforming raw DNS queries and responses into aggregated data is at the heart of this monitoring strategy. The Dnsflow data sets enable us to create baselines of normal activity in the network and build high-fidelity detections that include detailed forensic information. The data is comprised of three distinct but related data sets. These data sets are logged hourly in our initial release. Here are the details of the data sets.

  1. Dnsflow Summary
    This is the base data set with some attributes about the other two data sets.
  2. Dnsflow Clients
    This is the data set of client endpoints (source nodes) that initiated the DNS query.
  3. Dnsflow Subdomains
    This is the data set of subdomains in the DNS queries.
Data set Aggregation keys Aggregated data points
Dnsflow Summary
  • Second level domain
  • DNS query type
  • Transport protocol
  • Query count
  • Response count
  • Query bytes sum
  • Response bytes sum
  • First seen
  • Last seen
  • Longest subdomain
  • Whether Clients data set is truncated+
  • Whether Subdomains data set is truncated+
Dnsflow Clients
  • Second level domain
  • DNS query type
  • Transport protocol
  • Source IP address
  • Query count
  • Response count
  • Query bytes sum
  • Response bytes sum
  • First seen
  • Last seen
Dnsflow Subdomains
  • Second level domain
  • DNS query type
  • Transport protocol
  • Subdomain string
  • Query count
  • Response count
  • Query bytes sum
  • Response bytes sum
  • First seen
  • Last seen
  • Nameserver IP Address

+Dnsflow truncates the Clients and Subdomains data set if the number of distinct values exceeds a threshold. We record truncation events in the logged data sets via these properties and monitor them. The thresholds such that truncation is rare, however, this is essential for meeting our performance goals. It is important to note that the Dnsflow Summary data set is never truncated, and this data set alone is sufficient for most detections.

Example

Sample DNS queries and responses

The following table shows a sample of DNS queries and responses processed by the DNS server.

Q Type Domain Proto Source IP Nameserver IP
A onedrive.com UDP 1.1.1.1 11.11.11.11
AAAA onedrive.com UDP 1.1.1.1 11.11.11.11
A photos.onedrive.com UDP 1.1.1.1 11.11.11.11
A photos.onedrive.com UDP 2.2.2.2 11.11.11.11
A bing.com UDP 2.2.2.2 <empty>
TXT 13d74b57c7b.72805.cs2.evilzzz.com TCP 2.2.2.2 33.33.33.33
TXT 6e574x83c5v.72805.cs2.evilzzz.com TCP 2.2.2.2 33.33.33.33

Sample Dnsflow data sets

The next three tables show the Dnsflow data sets built from the processing of the above sample data.

Dnsflow Summary data set –

SLD Q Type Proto Q Count
onedrive.com A UDP 3
onedrive.com AAAA UDP 1
bing.com A UDP 1
evilzzz.com TXT TCP 2

Dnsflow Clients data set –

SLD Q Type Proto Source IP Q Count
onedrive.com A UDP 1.1.1.1 2
onedrive.com AAAA UDP 1.1.1.1 1
onedrive.com A UDP 2.2.2.2 1
bing.com A UDP 2.2.2.2 1
evilzzz.com TXT TCP 2.2.2.2 2

Dnsflow Subdomains data set –

SLD Q Type Proto Subdomains Q Count Nameserver IP
onedrive.com A UDP <empty> 1 11.11.11.11
onedrive.com AAAA UDP <empty> 1 11.11.11.11
onedrive.com A UDP photos 2 11.11.11.11
bing.com A UDP <empty> 1 <empty>++
evilzzz.com TXT TCP 13d74b57c7b.72805.cs2 1 33.33.33.33
evilzzz.com TXT TCP 6e574x83c5v.72805.cs2 1 33.33.33.33

++External Nameserver IP address is empty for responses from DNS data cached on the DNS server.

Detections

The next table contains some of the detections that we have implemented using the Dnsflow data.

Detection Detects Implementation overview
1)  Anomalous query type
  • Exfiltration
  • Infiltration
  • Scanning of IP range
Compare the Dnsflow data with baseline of second level domain and query type data –
  • Anomalous query types like TXT, MX may be an indication of exfiltration or infiltration.
  • Anomalous PTR queries may be an indication of scanning of IP range.
2)  Anomalous subdomain count
  • Exfiltration
  • C2 communication
Count the number and rate of distinct subdomains –
  • Anomalous total number of subdomains or anomalous rate of subdomains may be an indication of exfiltration or C2 communication.
3)  Anomalous long labels in queries
  • Exfiltration
  • C2 communication
Compare the length of labels in the Dnsflow Subdomains data with the baseline –
  • Long labels may be an indication of C2 communication or exfiltration.
4)  Anomalous rate (bytes per client) of DNS response data
  • Infiltration
Calculate rate of inbound bytes in an interval. (i.e. sum of response bytes / total number of clients) –
  • Anomalous rate of inbound bytes may be an indication of infiltration.
  • Join with Clients and Subdomains data to provide additional forensic context.
5)  Esoteric domains (domains queried by very few machines)
  • Flag potential malicious domains
Select domains with number of clients below a threshold and high query counts –
  • These domains could be malicious domains.
6)  Queries for young DNS domains
  • Flag potential malicious domains
Join the domains logged via Dnsflow with the whois data stream and flag domains that are created in the last x days –
  • These domains may be malicious domains.

We have successfully used Dnsflow to build a baseline of DNS activity in our network and have implemented several detections to detect DNS related anomalies. The Dnsflow data also enables us to include highly actionable information in the anomaly detection results. The results inform us about the potential abuse of DNS and help us protect our network from adversaries.

References