Researcher: Hui Zhang
The Dragnet System for Network Forensic Analysis
The Internet as it stands today is plagued by a wide variety of malicious attacks such as email viruses, worms, DoS attacks, and DDoS attacks. Much research has been done to improve the accuracy and response time in detecting attacks. However, it is obvious this is an arms race where new attacks will be invented trying to outwit existing signature-based detection and analysis techniques. Another significant research thread is to devise techniques to automatically respond to attacks, disrupt the attacks or minimize the damages by pushing back the attacks close to sources.
In the Dragnet project, we take the position that “good locks alone are not enough for true security.” In both physical and cyber worlds, it is important to have deterrent mechanisms so that attackers need to feel that they may have to bear consequences for their actions. In the case of physical world, we have law enforcement agencies who perform forensic analysis of crime scenes to track down criminals. Similarly in the Internet, we need to have mechanisms that perform forensic analysis of network data that leads to the identification of true attackers.
Internet-scale forensic analysis is challenging as all sophisticated attackers use multi-level attacks that begin by compromising or infecting innocent hosts and then use these zombies to initiate and propagate the actual attack.
Existing approaches to defending the network have focused on intrusion-detection and traceback systems. While these techniques allow network operators to determine when some types of attacks are taking place and allow some Denial of Service attacks to be blunted, the techniques only expose activity at the bottom-most level of what are actually multi-level attacks. Identifying the true attackers at the top-most level currently relies on manual effort, which is extremely tedious, time-consuming, and possibly infeasible.
In the Dragnet project, we propose an Internet auditing and forensic analysis system that enables Attacker Identification and Attack Reconstruction. These act as buildingblocks on which attack investigation can be based and attackersheld accountable. Attacker Identification is the abilityto accurately pinpoint the source(s) of the attack or infection.
Attack Reconstruction is the process of inferring which communications carry the attack forward. This not only identifies the compromised hosts for subsequent correction, but also provides crucial information about the attack propagation that can help in precluding future attacks of a similar kind. The focus of our work on identifying the true source of attacks through Attacker Identification and Attack Reconstruction differentiates it from other projects that seek to identify when an attack is occurring or to reactively blunt the effect of an attack already in progress. Only methods that identify the attacker, so that legal action can be taken, hold the potential to build a proactive defensestrategy that operates by deterring attacks.
Identifying the propagation of an attack is particularly difficult as the adversary is intelligent: attackers are bound to come up with smarter mechanisms trying to evade detection. Our approach is based on the one invariant across all attacks (present and future): for the attack to progress there must be communication among attacker and the associated set of compromised hosts, and the communication flows that cause new hosts to become infected form a causal tree, rooted at the source of the attack. While these flows may be subtle or invisible when observed individually from any single host, potentially the tree structure will stand out when viewed collectively. By identifying the overall structure of attack propagation, our approach can be agnostic to attack signatures or scanning rates and potentially be applicable to many classes of attacks.
Research Agenda in 2005/2006
Our research effort on the algorithmic side for an Internetscale forensic analysis system has shown great promise.
However, there will be a number of challenges in practice as a result of the sheer scale of the Internet traffic and the need for cooperation between different service providers.
Incremental Attack Reconstruction
Given the large number of ISPs and administrative domains (ADs) that make up the Internet, it is likely that Attacker
Identification and Attack Reconstruction will be deployed in a piecemeal fashion across the network. We may not have the complete host contact graph available. Some Ads will have very complete traffic auditing, others will have none, and many will audit packets only at their borders and peering points with other ADs. We will formulate different scenarios in which partial deployment can take place, and evaluate the sensitivity of our algorithm to the extent of deployment and cooperation among domains. We are currently exploring a distributed algorithm to incrementally perform random moonwalks among multiple Ads We believe that our algorithm for worm origin identification can be elegantly adapted to a distributed deployment scenario. In the distributed random moonwalk algorithm, each AD independently select flows inside its own network to start moonwalks. Since no single AD can have a global view of the network, many random walks may stop once they reach a flow that leaves the domain. However, the ADs can then exchange the frequency counts of these border flows where the walks stopped, and then continue the walks starting at the selected border flows. This process occurs in an iterative fashion, and at each iteration the information exchanged between ADs is minimal. Each network starts random moonwalks independently. After each round, the two networks exchange flow counts and continue on to the next round of moonwalks.
We observe that the causal edge detection accuracy of both networks increase monotonically as the process goes on, and eventually converges. The performance is comparable to that achieved by assuming a unified global view of the network. These preliminary results are encouraging. We plan to further investigate the overhead of message exchanges among multiple networks, as well as understand the worst case scenarios under which the algorithm may have delayed convergence.
A Federated Architecture for Deployment in a Multi-AD Environment
Issues of trust and cooperation between domains raise challenges with respect to protecting both domain proprietary information and end user privacy. It is particularly important to prevent the execution of data sharing that can either retrieve an arbitrary part of the entire host contact graph recorded by an AD (hence leaking business data about the AD) or read out all flows to or from an arbitrary host (hence violating the privacy of a normal host).
We propose a federated Dragnet architecture where each cooperating AD can independently deploy Dragnet monitors that log traffic and run the random moonwalk algorithm. A nice property of the distributed algorithm is that only frequency counts of border flows are shared across different ADs. No information about internal host communication will be exposed to the other participating ADs. Since the records of border flows across two different ADs will likely to be logged by both domains, exchanging frequency counts of these flows leaks no privacy about both ADs and end users.
We plan to further investigate the privacy implications of such data sharing, as attackers can carefully craft traffic patterns to create covert channels that may disclose sensitive information. The creation of a distributed network auditing service will also serve as a practical application of work by ourselves and other Cylab researchers to develop techniques that limit the disclosure of private information without compromising the amount of useful information retrieved, and without adding too much communication overhead.
Generalized Attack Models
Our current algorithms target specific spreading attacks such as worms. There are a large number of other attacks that utilize compromised computers to launch attack traffic, for example, DDoS attacks, where our specific algorithms may be less suitable. While the random moonwalk algorithm specifically exploits the worm “tree” structure, we are working on a more general framework to reveal the causal relationship between communications by correlating local observations.
Local Attack Detection
The knowledge of end host abnormal traffic patterns can be incorporated to strengthen the power of network forensic analysis. Our observations indicate that normal host behaviors have significant locality in their connection patterns.
This key insight can be used to guide the design of better detection and containment mechanisms, that are effective even for stealthy worms that are 5-6 orders of magnitude slower than today’s well-known attacks. In the context of enhancing investigative capabilities, local attack detection mechanisms can help guide or refine the results of the attack reconstruction algorithms.