A botnet is a network of computers on the Internet, each of which has been compromised and is under the influence of a coordinated group of malware instances. Bots on this network run without the owners’ knowledge, and send out transmissions (viruses or spam) to other computers on the Internet. Botnets are controlled by a ‘bot-master’ through command-and-control (C&C) channels.
Botnets serve as platforms for distributed denial-of-service (DDoS) attacks, phishing, spamming and other fraudulent activities, thus making botnet detection essential. This tip will look at a botnet detection strategy via the fast flux characteristics of botnets. Through fast flux, a bot-master DNS uses different IP addresses to avoid detection of the botnet servers’ physical location. This is a unique characteristic of botnets—rapidly changing the bindings of IP addresses to domain names prevents detection of hosts.
The detection approach that we are about to discuss applies K-Means clustering to DNS data for heuristic detection of fast flux and other typically anomalous botnet characteristics. There are several challenges to be faced in detection of botnets:
- Use of an appropriate network monitoring tool, to collect the data required for analyzing and detecting botnets. Suitable tools include pcap, WinPcap and Wireshark.
- Selection of a clustering algorithm, to ensure relevant and strong correlations between datasets. Ensure use of well-defined attributes for data classification.
- Deciding heuristics for the clustering algorithm, to pinpoint and group “malicious” Internet traffic.
- Identification of fast flux characteristics and elimination of false positives, to ensure high recall value and accuracy.
- Employing a dynamic approach, as static and signature-based approaches may not be effective due to anti-X algorithms that make botnets highly dynamic and adaptive, decreasing the rate of successful botnet detection.
Botnets can change their C&C content in terms of encryption, protocols (such as HTTP, IRC and FTP), and structure (either centralized or peer-to-peer), as detailed in Figure 1.
Figure 1: Possible structures of a botnet (a) centralized (b) peer-to-peer.
Courtesy: Guofei Gu et al; BotMiner - Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection.
The framework used for botnet detection employs several steps. A network monitoring tool collects data on the network traffic. The clustering algorithm then classifies traffic, after which heuristics are applied. The data is then separated into different groups and scrutinized for botnet activity.
Detection of BotNets starts with monitoring the Internet traffic, followed by analysis and clustering of the data to compare it with the neighboring nodes to determine a bot-infection (Fig. 2). The steps followed are:
- Network Traffic Monitoring
- Clustering of Network Traffic Data
- Comparison of clustered data with neighbor
Figure 2: System for Fast Flux based botnet detection
The methodology used is as follows:
- Collection of DNS data from the network monitoring tool and transformation into a .csv file using the Logparser tool: Using Wireshark, DNS data is captured on port 53. Wireshark produces the captured data file in .cap format, which is converted before being input into the cluster tool, Weka
- Insertion of data that looks like a botnet with fast flux characteristics:
a)Domain name and corresponding IP addresses: The detection of fast flux domains reveals malicious intent, and is used to flag suspected machines. If the domain name corresponds to a large set of different IP addresses, then the machine is bot-infected.
b)Session time between two IP addresses:Source and destination IP addresses communicating for the same interval of time can be combined to form a cluster, as this is a characteristic of botnet-infected machines.
- Retrieval of the DNS name and respective IP addresses from packet information:
The .cap file containing DNS data is converted into .csv format. A program is used to extract domain names and IP addresses from the Wireshark-captured data file. Next, IP addresses and domain names are converted into a number.
- Application of K-means clustering to the data using DNS name:
K-meansis a simple, unsupervised learning algorithm to solve clustering problems. K-means clustering is applied to the data to form ‘K’ number of clusters. The main idea behind K-means clustering is to define K-centroids—one per cluster.
Figure 3: Results of clustering; complete vertical lines show infected traffic, indicating the presence of a botnet infection.
Botnets are a serious threat to network security. The fast flux method can be utilized effectively for botnet detection at an early stage. Thus, the network can be secured and spread of fraudulent activities such as spamming, phishing and DDoS attacks can be prevented.
This article is based on a paper presented by Nilesh Sharma & Pulkit Mehndiratta at null con 2011. Compiled by Varun Haran.
About the experts: Nilesh Sharma and Pulkit Mehndiratta are M.Tech students at IIIT Delhi, specializing in Information Security. Their interest areas include detecting botnets, cyber security, cryptography and cyber forensics. Sharma has lectured at Bhagwant University and B.M.A.S. Engineering College.