Using Machine Learning for DNS Exfiltration / Tunnel Detection

Syed Suleman Qutb
3 min readNov 21, 2020

Cyber Security Defensive Mechanisms built on Supervised Machine Learning Algorithms heavily rely on old historical logs of recent Cyber Attacks in order to train and tune detection models. The accuracy and relevance of such historical attack dataset facilitates in the effective detection of malwares or recognizing C&C (command & control) domains, thus evading false-positive alerts in SIEM and SOAR systems.

However, unlike other engineering domains, availability of such ideal datasets in Cyber Security is quite scarce. Therefore, Cyber Threat Detection has always been challenging since threat actors are always evolving, thus making implementation of an effective supervised and reinforced machine-learning framework a mammoth task. In order to address this gap (non-availability of dataset in training/validation phase), security researchers can use unsupervised clustering algorithms (like K-Means, DBSCAN, X-Means, Agglomerative, etc.) for classification or identification of anomalies in the datasets.

In this work, we utilized the Splunk Machine Learning Toolkit to triage anomalous traffic patterns to detect data exfiltration over alternative protocols (explained in MITRE ATT&CK T1048). We studied and utilized DNS Traffic Logs ingested to Splunk from multiple Data Sources (DNS Servers, Firewalls, WAF, L7 NetFlows and Splunk STREAM Captures). Splunk’s Machine Learning Toolkit is a classical data science platform based on SciKit Learn Project, which can be used to detect uncommon / suspicious patterns in our network.

MITRE ATT&CK T1048, in particular, elaborates threat actors where a client machine sends significantly more data than it receives from an alternative protocol like Domain Name Service DNS. SOC teams often don’t monitor and pay much attention to DNS as it widely used as a translation tool for domain names and is not intended for data transfer. Whereas, DNS queries are also capable of transferring data between two connected systems. Unfortunately, this makes DNS an appealing vector for cyber-attacks that can secretly transport commands and exfiltrate data through DNS tunneling.

We evaluated multiple algorithms provided by Splunk ML Toolkit (K-Means, BIRCH, DBSCAN) and found K-Means as a better option due to its performance and robustness against huge datasets. Our SPL Query that explains our analytics is below:

To detect DNS tunneling at real time, a component has been implemented in our Splunk EUNOMATIX MLDETECT app, in which bytes sent out are analyzed at real-time using unsupervised ML algorithm which detects and pinpoints the IP address of clients having abnormal traffic flow. For more details and functionality of our ML based detection framework, please contact EUNOMATIX, info@eunomatix.com.

--

--

Syed Suleman Qutb

Cybersecurity Solutions Architect @ EUNOMATIX, USA. EUNOMATIX specializes in out-of-the-box Cyber Detection & Preemption.