In the DNS Intrusion Detection in Office 365 post we introduced strategies implemented in Office 365 to detect anomalous DNS activity. Dnsflow is one of those strategies and involves aggregating DNS data processed by the DNS servers in Office 365. In this post, we discuss detailed benefits, challenges and implementation of Dnsflow. We also discuss how Dnsflow enables us to easily build detections that inform us about the potential abuse of DNS in our network.
- Access to complete DNS data
Dnsflow runs on DNS servers, which are the choke point of DNS traffic in our network. As a result, Dnsflow has access to all DNS queries and responses.
- Access to Nameserver IP address
Nameserver IP address is critical forensic information for incident response. DNS servers make outgoing DNS queries to the nameserver and Dnsflow collects this information.
- High performance and low impact monitoring
Dnsflow uses Enhanced DNS Logging feature of Windows Sever 2012 R2 via the Microsoft-Windows-DNSServer ETW trace, which provides very low impact monitoring.
- Ability to detect anomalies in near real time
Since Dnsflow is aggregated DNS data for detecting anomalies, very little off the box processing is needed and the data is ready to be consumed in the detections at the end of each aggregation interval.
Challenges, Risks, and Limitations
- DNS servers process very high volume of DNS data. While, the data source is low impact, the processing and aggregation done by Dnsflow needs to be highly efficient as well.
- There is no visibility into the process that initiated the DNS query on the remote client.
- Nameserver IP address accuracy depends on network topology and configuration of DNS server.
- Since DNS Servers are the central point for monitoring DNS data in the network, they also become central point of failure and downtimes may result in large loss of monitoring data.
Accessing raw DNS data
We use the analytic events of the Microsoft-Windows-DNSServer ETW trace as the raw data source for Dnsflow data. It includes full DNS query and response data. We subscribe to the following events –
- 256: Query received
- 257: Response success
- 261: Recurse response in
We use the krabsetw library in our monitoring to consume the ETW events. Our agent receives the events for each DNS lookup upon configuring the library for the above events.
Addressing the challenges, risks and limitations
- Efficient data handling with fast-filtering
Dnsflow filters out very high-volume domains belonging to trusted internet services very early in the monitoring pipeline. Most known DNS intrusion exploits require an attacker to maintain control of the domain's nameserver for an extended time. Hijacking and using these domains is not a viable approach for the attackers. So, fast-filtering introduces minimal risk and significantly improves data handling.
- Process attribution
Results from detections using the Dnsflow data are joined with the Process and Netflow monitoring data from our servers to attribute a DNS lookup to a process. See How to Make Results Actionable section of DNS Intrusion Detection in Office 365.
- Identifying nameserver for a domain
Dnsflow associates the IP address of the last recursive query as the nameserver for the domain being resolved.
- Preventing and mitigating gaps in monitoring data
The other two detection approaches discussed in the post DNS Intrusion Detection in Office 365 have some overlap with Dnsflow. They help in covering the gaps in case of loss of Dnsflow data.
Dnsflow data sets
Transforming raw DNS queries and responses into aggregated data is at the heart of this monitoring strategy. The Dnsflow data sets enable us to create baselines of normal activity in the network and build high-fidelity detections that include detailed forensic information. The data is comprised of three distinct but related data sets. These data sets are logged hourly in our initial release. Here are the details of the data sets.
- Dnsflow Summary
This is the base data set with some attributes about the other two data sets.
- Dnsflow Clients
This is the data set of client endpoints (source nodes) that initiated the DNS query.
- Dnsflow Subdomains
This is the data set of subdomains in the DNS queries.
|Data set||Aggregation keys||Aggregated data points|
+Dnsflow truncates the Clients and Subdomains data set if the number of distinct values exceeds a threshold. We record truncation events in the logged data sets via these properties and monitor them. The thresholds such that truncation is rare, however, this is essential for meeting our performance goals. It is important to note that the Dnsflow Summary data set is never truncated, and this data set alone is sufficient for most detections.
Sample DNS queries and responses
The following table shows a sample of DNS queries and responses processed by the DNS server.
|Q Type||Domain||Proto||Source IP||Nameserver IP||…|
Sample Dnsflow data sets
The next three tables show the Dnsflow data sets built from the processing of the above sample data.
Dnsflow Summary data set –
|SLD||Q Type||Proto||Q Count||…|
Dnsflow Clients data set –
|SLD||Q Type||Proto||Source IP||Q Count||…|
Dnsflow Subdomains data set –
|SLD||Q Type||Proto||Subdomains||Q Count||Nameserver IP||…|
++External Nameserver IP address is empty for responses from DNS data cached on the DNS server.
The next table contains some of the detections that we have implemented using the Dnsflow data.
|1) Anomalous query type||
||Compare the Dnsflow data with baseline of second level domain and query type data –
|2) Anomalous subdomain count||
||Count the number and rate of distinct subdomains –
|3) Anomalous long labels in queries||
||Compare the length of labels in the Dnsflow Subdomains data with the baseline –
|4) Anomalous rate (bytes per client) of DNS response data||
||Calculate rate of inbound bytes in an interval. (i.e. sum of response bytes / total number of clients) –
|5) Esoteric domains (domains queried by very few machines)||
||Select domains with number of clients below a threshold and high query counts –
|6) Queries for young DNS domains||
||Join the domains logged via Dnsflow with the whois data stream and flag domains that are created in the last x days –
We have successfully used Dnsflow to build a baseline of DNS activity in our network and have implemented several detections to detect DNS related anomalies. The Dnsflow data also enables us to include highly actionable information in the anomaly detection results. The results inform us about the potential abuse of DNS and help us protect our network from adversaries.