Q&A with David Brumley
posted by Richard Power
"I believe software security is much more than arguing about the security of the code compiled. We need to secure the entire life cycle of code, from development, to deployment, to end-user configuration, to eventual retirement. Up till now, most software security research and practice has focused on finding and protecting against vulnerabilities in source code."
NOTE: For more background on Brumley’s work related to “Analysis and Defense of Vulnerabilities in Binary Code,” watch a video of a compelling presentation, review the full presentation (.pdf) and view a listing of his papers and presentations.]
CyLab Chronicles: What is VINE?
BRUMLEY: VINE is our first-generation tool we have developed for analyzing binary (i.e., executable) code. We have extended VINE to an entire binary analysis platform, which we imaginatively call BAP. BAP allows us to formally, faithfully, and accurately reason about executing a binary program. One of the fundamental uses of BAP is to accurately predict future executions of a program. For example, we've used BAP to reason about executions of vulnerable program in order to automatically generate vulnerability signatures that filter out exploits, the execution of malware in order to create better malware detectors, and find the difference between executions of different implementations of the same protocol, e.g., two web servers, to automatically generate fingerprints that can be used to remotely identify which application is running.
BAP is part of our larger research agenda which focuses on securing the entire life-cycle for software. I believe software security is much more than arguing about the security of the code compiled. We need to secure the entire life cycle of code, from development, to deployment, to end-user configuration, to eventual retirement. Up till now, most software security research and practice has focused on finding and protecting against vulnerabilities in source code. BAP fits into our research in the software life-cycle by addressing all the security issues that arise after the source code is compiled.
CyLab Chronicles: What problems does it address -- not just from the technical perspective but from the business and end-user perspectives?
BRUMLEY: One advantage of our approach is we deal with security issues in the context of the average user. Most users do not have access to the source code of the programs they run. However, almost everyone has access to the programs they execute in at least binary form. Off-the-shelf (COTS) software (e.g., Microsoft Windows, Adobe Acrobat, etc.) is typically only available to end-users in binary form. In addition to legitimate software, businesses and professionals also need to the ability to reason about malicious code, which again is typically only available in binary form. Thus, security techniques that only require access to the program binary are likely to be applicable to a large number of people, and in a large number of situations. Further, binary code analysis allows us to argue about the security of the code that will run, not just the code that was compiled. Simply put: binary analysis allows us to reason about the code most people have, and in a way most faithful to what will actually be executed.
CyLab Chronicles: What is unique in its approach?
BRUMLEY: There are two unique aspects of this approach. First, we are striving to do faithful analysis...the sort of analysis that allows us to predict what a program will do. Previous approaches to binary analysis could not predict future executions as accurately or efficiently as we can with BAP. Faithful analysis dovetails into the second unique aspect. We are not only trying to argue whether the code is secure. Since we can predict what the code will do when executed, we can ask extremely interesting questions.
For example, we have shown you can do the following:
- Automatically generate exploits by analyzing the patches. Patches reveal the original bug in the program, and attackers know this. We show that you could automatically generate exploits by analyzing the patches. When applicable, our techniques only require a few minutes to generate working exploits. This means that anyone who has access to the patch should be, for security purposes, considered armed with an exploit. It also demonstrates the counter-intuitive result that distributing patches does not always help security.
- Automatically discover when two implementations of the same protocol, such as HTTP, may differ. Further, we can automatically generate inputs that trigger such differences. Inputs that trigger differences are important for fingerprinting.
- Automatically generate input filters (aka signatures) based upon analysis of the vulnerability in the program. Unlike other methods, our filters have accuracy guarantees, e.g., anything they say is an exploit is guaranteed to be an exploit. Previous techniques, and even manual signature generation made by experts, cannot make such accuracy guarantees.
CyLab Chronicles: What commercial applications do you see BAP contributing to?
BRUMLEY: Where I see the commercial applications of BAP is any scenario where you want to reason about the code you will execute. There are numerous commercial applications using BAP. For example, we have had a successful relationship with Symantec for incorporating our techniques on automatic signature generation, and we have ongoing collaborations with companies for commercializing automatic exploit generation. We are also exploring new opportunities for commercializing work on malware analysis. Companies seem to like our work since it requires only access to a program binary and offer security guarantees, thus our solutions tend to work in a wide range of scenarios.
CyLab Chronicles: Software security is a vital area of research. What are some of the greatest challenges in this field?
BRUMLEY: I tend to work on two underlying, and what I believe are, fundamental challenges.
First, we need to develop security techniques that offer guarantees. In order to achieve strong security guarantees in practice, I believe it is important to focus on how those systems are implemented. The software actually deployed is a full specification and ground truth for the security offered in the real world. Software security is much more than simply looking for bugs; it is about reasoning about all aspects of how software works in real systems. For example, our work in signature generation is the first that offers accuracy guarantees, e.g., we will never mistake a safe input for an exploit. We achieve these guarantees by analyzing the vulnerability itself. Another example is we show that we can break a 1024-bit RSA key in an OpenSSL enabled Apache server in about 2 hours. Although RSA is secure mathematically, actual implementations leak a lot of information, e.g., in our case, through the amount of time it takes to complete a cryptographic operation. All too often we mistakenly believe something is secure only to find out the code itself doesn't adhere to the properties we believe are true on paper.
The second challenge stems from the first: in order to reason about the security of real systems, we need to be able to reason about real code. However, real code is complex. Thus, we need to continue to develop more scalable and more efficient techniques, while not sacrificing accuracy. This challenge is not unique to security; it is also found in formal methods, compilers, and even programming languages. Thus, any advances along this front are likely to be applicable to many disciplines within computer science.