DNS Intrusion Detection using Dnsflow

Article
04/25/2017

In the DNS Intrusion Detection in Office 365 post we introduced strategies implemented in Office 365 to detect anomalous DNS activity. Dnsflow is one of those strategies and involves aggregating DNS data processed by the DNS servers in Office 365. In this post, we discuss detailed benefits, challenges and implementation of Dnsflow. We also discuss how Dnsflow enables us to easily build detections that inform us about the potential abuse of DNS in our network.

Benefits

Access to complete DNS data
Dnsflow runs on DNS servers, which are the choke point of DNS traffic in our network. As a result, Dnsflow has access to all DNS queries and responses.

Access to Nameserver IP address
Nameserver IP address is critical forensic information for incident response. DNS servers make outgoing DNS queries to the nameserver and Dnsflow collects this information.

High performance and low impact monitoring
Dnsflow uses Enhanced DNS Logging feature of Windows Sever 2012 R2 via the Microsoft-Windows-DNSServer ETW trace, which provides very low impact monitoring.

Ability to detect anomalies in near real time
Since Dnsflow is aggregated DNS data for detecting anomalies, very little off the box processing is needed and the data is ready to be consumed in the detections at the end of each aggregation interval.

Challenges, Risks, and Limitations

DNS servers process very high volume of DNS data. While, the data source is low impact, the processing and aggregation done by Dnsflow needs to be highly efficient as well.
There is no visibility into the process that initiated the DNS query on the remote client.
Nameserver IP address accuracy depends on network topology and configuration of DNS server.
Since DNS Servers are the central point for monitoring DNS data in the network, they also become central point of failure and downtimes may result in large loss of monitoring data.

Implementation

Accessing raw DNS data

We use the analytic events of the Microsoft-Windows-DNSServer ETW trace as the raw data source for Dnsflow data. It includes full DNS query and response data. We subscribe to the following events –

256: Query received
257: Response success
261: Recurse response in

We use the krabsetw library in our monitoring to consume the ETW events. Our agent receives the events for each DNS lookup upon configuring the library for the above events.

Addressing the challenges, risks and limitations

Efficient data handling with fast-filtering
Dnsflow filters out very high-volume domains belonging to trusted internet services very early in the monitoring pipeline. Most known DNS intrusion exploits require an attacker to maintain control of the domain's nameserver for an extended time. Hijacking and using these domains is not a viable approach for the attackers. So, fast-filtering introduces minimal risk and significantly improves data handling.

Process attribution
Results from detections using the Dnsflow data are joined with the Process and Netflow monitoring data from our servers to attribute a DNS lookup to a process. See How to Make Results Actionable section of DNS Intrusion Detection in Office 365.

Identifying nameserver for a domain
Dnsflow associates the IP address of the last recursive query as the nameserver for the domain being resolved.

Preventing and mitigating gaps in monitoring data
The other two detection approaches discussed in the post DNS Intrusion Detection in Office 365 have some overlap with Dnsflow. They help in covering the gaps in case of loss of Dnsflow data.

Dnsflow data sets

Transforming raw DNS queries and responses into aggregated data is at the heart of this monitoring strategy. The Dnsflow data sets enable us to create baselines of normal activity in the network and build high-fidelity detections that include detailed forensic information. The data is comprised of three distinct but related data sets. These data sets are logged hourly in our initial release. Here are the details of the data sets.

Dnsflow Summary
This is the base data set with some attributes about the other two data sets.
Dnsflow Clients
This is the data set of client endpoints (source nodes) that initiated the DNS query.
Dnsflow Subdomains
This is the data set of subdomains in the DNS queries.

Data set	Aggregation keys	Aggregated data points
Dnsflow Summary	Second level domain DNS query type Transport protocol	Query count Response count Query bytes sum Response bytes sum First seen Last seen Longest subdomain Whether Clients data set is truncated⁺ Whether Subdomains data set is truncated⁺
Dnsflow Clients	Second level domain DNS query type Transport protocol Source IP address	Query count Response count Query bytes sum Response bytes sum First seen Last seen
Dnsflow Subdomains	Second level domain DNS query type Transport protocol Subdomain string	Query count Response count Query bytes sum Response bytes sum First seen Last seen Nameserver IP Address

⁺Dnsflow truncates the Clients and Subdomains data set if the number of distinct values exceeds a threshold. We record truncation events in the logged data sets via these properties and monitor them. The thresholds such that truncation is rare, however, this is essential for meeting our performance goals. It is important to note that the Dnsflow Summary data set is never truncated, and this data set alone is sufficient for most detections.

Example

Sample DNS queries and responses

The following table shows a sample of DNS queries and responses processed by the DNS server.

Q Type	Domain	Proto	Source IP	Nameserver IP
A	onedrive.com	UDP	1.1.1.1	11.11.11.11
AAAA	onedrive.com	UDP	1.1.1.1	11.11.11.11
A	photos.onedrive.com	UDP	1.1.1.1	11.11.11.11
A	photos.onedrive.com	UDP	2.2.2.2	11.11.11.11
A	bing.com	UDP	2.2.2.2	<empty>
TXT	13d74b57c7b.72805.cs2.evilzzz.com	TCP	2.2.2.2	33.33.33.33
TXT	6e574x83c5v.72805.cs2.evilzzz.com	TCP	2.2.2.2	33.33.33.33

Sample Dnsflow data sets

The next three tables show the Dnsflow data sets built from the processing of the above sample data.

Dnsflow Summary data set –

SLD	Q Type	Proto	Q Count
onedrive.com	A	UDP	3
onedrive.com	AAAA	UDP	1
bing.com	A	UDP	1
evilzzz.com	TXT	TCP	2

Dnsflow Clients data set –

SLD	Q Type	Proto	Source IP	Q Count
onedrive.com	A	UDP	1.1.1.1	2
onedrive.com	AAAA	UDP	1.1.1.1	1
onedrive.com	A	UDP	2.2.2.2	1
bing.com	A	UDP	2.2.2.2	1
evilzzz.com	TXT	TCP	2.2.2.2	2

Dnsflow Subdomains data set –

SLD	Q Type	Proto	Subdomains	Q Count	Nameserver IP
onedrive.com	A	UDP	<empty>	1	11.11.11.11
onedrive.com	AAAA	UDP	<empty>	1	11.11.11.11
onedrive.com	A	UDP	photos	2	11.11.11.11
bing.com	A	UDP	<empty>	1	<empty>⁺⁺
evilzzz.com	TXT	TCP	13d74b57c7b.72805.cs2	1	33.33.33.33
evilzzz.com	TXT	TCP	6e574x83c5v.72805.cs2	1	33.33.33.33

⁺⁺External Nameserver IP address is empty for responses from DNS data cached on the DNS server.

Detections

The next table contains some of the detections that we have implemented using the Dnsflow data.

Detection	Detects	Implementation overview
1) Anomalous query type	Exfiltration Infiltration Scanning of IP range	Compare the Dnsflow data with baseline of second level domain and query type data – Anomalous query types like TXT, MX may be an indication of exfiltration or infiltration. Anomalous PTR queries may be an indication of scanning of IP range.
2) Anomalous subdomain count	Exfiltration C2 communication	Count the number and rate of distinct subdomains – Anomalous total number of subdomains or anomalous rate of subdomains may be an indication of exfiltration or C2 communication.
3) Anomalous long labels in queries	Exfiltration C2 communication	Compare the length of labels in the Dnsflow Subdomains data with the baseline – Long labels may be an indication of C2 communication or exfiltration.
4) Anomalous rate (bytes per client) of DNS response data	Infiltration	Calculate rate of inbound bytes in an interval. (i.e. sum of response bytes / total number of clients) – Anomalous rate of inbound bytes may be an indication of infiltration. Join with Clients and Subdomains data to provide additional forensic context.
5) Esoteric domains (domains queried by very few machines)	Flag potential malicious domains	Select domains with number of clients below a threshold and high query counts – These domains could be malicious domains.
6) Queries for young DNS domains	Flag potential malicious domains	Join the domains logged via Dnsflow with the whois data stream and flag domains that are created in the last x days – These domains may be malicious domains.

We have successfully used Dnsflow to build a baseline of DNS activity in our network and have implemented several detections to detect DNS related anomalies. The Dnsflow data also enables us to include highly actionable information in the anomaly detection results. The results inform us about the potential abuse of DNS and help us protect our network from adversaries.