about william Scherlis

Bill ScherlisWilliam L. Scherlis is a full Professor in the School of Computer Science at Carnegie Mellon. He is the founding director of PhD Program in Software Engineering and director of Institute for Software Research (ISR) in the School of Computer Science. His research relates to software assurance, software evolution, and technology to support software teams.

Scherlis is involved in a number of activities related to technology and policy, recently testifying before Congress on innovation and information technology, and, previously, on roles for a Federal CIO. He interrupted his career at CMU to serve at DARPA for six years, departing in 1993 as senior executive responsible for coordination of software research. While at DARPA he had responsibility for research and strategy in computer security, aspects of high performance computing, information infrastructure, and other topics.

[] | [profile]

CyLab Chronicles

Q&A with Bill Scherlis

posted by Richard Power

CyLab Chronicles: Tell us about the Institute for Software Research – when it was established, what are its aims, what companies or groupings are you working with?

SCHERLIS: The Institute for Software Research (ISR) is an academic department within the School of Computer Science (SCS). We are one of six SCS departments, and we have about 250 faculty, staff, and graduate students. In the ISR, we focus primarily on two areas. First, we are the focal point of software engineering research within the university. I say this acknowledging that there is quite a bit of software-related research and education going on at Carnegie Mellon. We have information systems programs across the university. We also have the Software Engineering Institute (SEI), with the mission to advance best practice and transfer technology. In fact, SEI’s related work really is in both of their principal areas of activity – software engineering practices and cyber security. That’s because software assurance is where these two things intersect. That’s the case in ISR also. The difference is that ISR is an academic unit. We have multiple degree programs, including two PhD programs and multiple professional MS programs. We have a PhD program in Software Engineering, now about nine years old, and several MS programs including the venerable Masters of Software Engineering program, soon to celebrate its 20th anniversary.

The second primary area in ISR is “Computers, Organizations and Society” (COS), which encompasses topics such as privacy, security, social networks, mobile devices, supply chains, and also topics that relate to the structure and management of distributed engineering teams, such as are now commonplace for large scale software projects. We have a PhD program in COS, in its sixth year.

The ISR’s technical competencies, supporting both Software Engineering and COS, are principally of three kinds: (1) core computer science, (2) organizations, business, economics, markets, (3) policy, regulatory, and societal issues. For example, if you are doing research in mobile devices and you want to do both good science and also have a potential impact in the world, you really need to have expertise in all three of those areas – if you do not understand how the industry is structured and if you don’t understand the regulatory environment, then those smart technical ideas and the good technical and empirical science to support them may not be relevant to achieving a meaningful impact. This multi-faceted style of engagement is characteristic of the culture of ISR.

The same is true with software engineering. Software engineering is the sub-discipline of computer science that infuses engineering principles into software development. That means we not only have to understand the core computer science that we apply – those often gem-like technical results – but also the grisly reality that we are working with actual humans in our engineering organizations and that we have customers, bosses, suppliers, cost constraints, schedule constraints, existing broken code, and so on. This is the framework within which we work to develop innovative new applications.

With respect to cyber security, we engage both areas of ISR – Software Engineering and COS. There is quite a bit of work, for example, that relates to privacy. If our privacy research is to have an impact, we need to understand not just the core computer science issues but also an array of business, financial, legal and policy issues. Even with this understanding there is a lingering question. We recognize privacy as a hugely challenging issue for computer users and IT organizations. But, despite this, we must nonetheless consider what are the possible roles, if any, for university scientific research in addressing it. One might guess, at first blush, that it is just a matter of adapting business practices and policies, and developing the political will to take on the issue at the policy level. There was some uncertainty about the answer for awhile, especially given the amorphous character of the issue.

It is quite clear, now, that there is a preponderance of evidence that we in the university really can make a difference, and that we can have significant impact in addressing the privacy issue, both scientifically and in the world. In ISR we have three faculty members, Lorrie Cranor, Norman Sadeh, and Latanya Sweeney, all of whom are involved in various aspects of privacy research. All have demonstrated quite compellingly the value of doing good science in this area – and the value is not to just improving our understanding these issues, but also to achieving an impact in the world on the basis of that science.

A significant portion of the ISR activity contributes to CyLab goals. While CyLab has its own modestly sized staff, it is primarily an umbrella under which is a wide range of work across the university. CyLab sponsors ISR graduate students in many areas. Under this CyLab umbrella new kinds of collaborations are being created across the university as we contemplate the many dimensions of the cybersecurity challenge. These challenges fit very well with both the ISR portfolio of research areas and also our “triple threat” style of engagement with research challenges, which I just described. And so there is a natural affinity. I’ve focused so far on the example of privacy, but we are very active in work relating to software assurance, secure systems architecture, mobility, social networks, and so on.

CyLab Chronicles: Let's explore the interdependence of software assurance and cyber security? You talk about the "evolving software eco-system." What is this in technical terms? What does it mean to the software developer, to the cyber attacker, to the end user? What are the challenges and opportunities in this space?

SCHERLIS: It is an obvious point, but we often need to remind ourselves that most of the functionality and flexibility of modern IT systems is manifest through software. Most people would be surprised to find how much of the functionality of an automobile, or fighter aircraft, or cardiac pacemaker, or music system is manifest in software. When we build software we write policies, define requirements, do architecture, develop tests. But it is the software code that actually executes, and so, in the end, we must reckon with code. Additionally, when we develop security capabilities, they are most often embodied in software, whether they involve crypto, biometrics, e-business, net-centric warfare, or automation for utilities and infrastructure. It has been said more than once that, rather than breaking your clever crypto, it is a lot easier to break the software on either side of it.

This means we must think about software practices if we are going to address many of the core issues of cyber security. And this in turn means focusing quite directly on our ability to produce software that is secure – and on doing so in a way that affords us the opportunity to potentially make promises about the security aspects of that software.

Software assurance is a human judgment about various qualities and characteristics of software. To get to the point where we have enough evidence in hand to make such a judgment with confidence, we need to think clearly about our software and the way we produce it.

We can do that in two ways. We can go bottom-up, thinking about how we write individual lines of code, and how we can design and develop those lines of code in ways that will lead to higher levels of security. We are quite excited by the CyLab research in that area. In ISR several of us are extensively involved in software assurance, focusing on how we can apply software analysis, both static and dynamic, in support of assurance at scale, in a way that works for existing development practices and teams, and in a way that recognizes the reality of systems constructed from diversely-sourced components. We also focus on how, in interacting with advanced tools, we might express the critical security characteristics we wish to assure using those analytic techniques. There is related work in other projects in CyLab, for example the work of Robert Seacord and team in CERT, who are focusing on the development of guidelines for the authoring of secure code.

But we also need to think about it top-down. That involves considering how large scale systems can best be architected and managed, especially given the shifts in the infrastructure and deployment of those large scale systems, for example with increased emphasis on SAAS (software as a service), cloud computing, and the like. Perhaps more importantly, it also involves recognizing the varying levels of trust we confer on the diverse sources of the many components that comprise these systems – the libraries, frameworks, SAAS sources, tools, and internally developed components. A modern system really is an assembly that results from the actions of a diverse and often world-wide supply chain.

This means that to make an assurance claim about the aggregate system, we must think carefully about the relationships among producers and consumers within that supply chain. What kinds of assurance claims can be made by a component or framework producer, and how can a client or consumer verify (and validate) those claims? Let me make this point more concretely. If you are building a modern e-business system, then that system will necessarily have diverse components, such as operating systems, databases, application servers, web servers, possibly various ERP frameworks, and business rules and plug-ins to provide particular capabilities, such as for customer relationship management or supply chain management. In the end, this large system will have contributions from many vendors and, in addition, it will have a fair amount of custom code to tailor all those various components to the needs of our particular enterprise. The challenge in this environment is how to come to the assurance judgment – how we can develop confidence regarding the overall system. We can’t control every single developer, or the choices of the individuals who participate in the development of very large systems, but even if we could (and for many categories of systems this is done) developers still make mistakes. They are humans and will make mistakes, intentionally or not.

This means that we must have practices in place that allow us to catch those mistakes, ranging from large architectural mistakes, to conceptual mistakes such as following framework API rules, and various coding mistakes, large and small. There is more, also: we have to have practices in place that keep developers aware and informed of the best practices with respect to developing code that can support assurance claims. What feedback can we provide developers on the spot as they develop code?

This would be easy if we were developing similar systems over and over. But we’re not so lucky. For major enterprises, success in competing in the marketplace requires innovation – the creation of new value that differentiates from peers – and this innovation motivates novelty in functionalities and, often, novelty in architectural structures and infrastructure choices for new systems. This, combined with rapid evolution of the threat, means we are less often able to rely on precedent and experience than we might hope. We must face the challenge of how we can get to an acceptable level of confidence concerning these new structures and their security characteristics. What are the potential systems level vulnerabilities that are created within these new structures?

An often-cited example is the architecture of modern web-based applications. If you use a modern web-based e-mail or calendar system from any of the major vendors, you will find there is quite a bit of functionality directly manifest in the web browser. For example, the actions associated with hovering over a cell in the matrix of dates and times for your calendar, or the fast response when you click on an e-mail message header. The reason you can get this very fast response is that a lot of data is cached on your computer and a lot of computation is being done directly in your browser. This is accomplished through a series of technologies that together are called AJAX, which is characterized by asynchronous HTTP queries back to a server, quite a bit of JavaSript functionality, and ways to directly access the data structures behind the rendering of web pages in your browser. The AJAX-enabled experience is wonderful for users because it gives fast response – much faster than could be experienced if every action required transacting with a remote server and rendering a new web page. Before AJAX, with the bad old web-based e-mail clients, every action led to a few seconds of delay while you were waiting for network communications, server transactions, and a new page to be rendered. The well known problem with AJAX, is that, relative to the past, there is a much wider interface between your browser and the server. That is, the attack surface of the server may be much larger because there are more kinds of things that go on back and forth between the client and the server, and more expectations.

This means that it is easy for developers to make security mistakes. For example, in designing the server code, it may be all too easy to assume that user inputs are validated in the client code, so server-side validation may seem unnecessary, and the extra work to code the seemingly redundant validation can be skipped. But even assuming that the browser-side code does do the validation, a clever adversary could still interpose a bogus client between the actual client and the server, and then exploit the fact that the server isn’t doing sufficient input validation.

This is the pattern: an innovative but natural architectural choice was made on the basis of a totally compelling value case, focused here on usability, but, unless there is good security thinking at implementation time, there could be new vulnerabilities as a result. Gary McGraw, from Cigital, has an excellent and all-too-entertaining story about how this situation plays out, analogously but even more so, in the world of online computer games.

There is something else I want to say about this software supply chain phenomenon, where multiple producers and consumers effectively collaborate to produce systems, but in whom we invest varying levels of trust. It is easy when people talk about attack service or input validation to focus your attention on interactions involving a user, a web browser, and a server, as we just did a moment ago. Another favorite example is the SQL injection attack, which is always shown as somebody typing unexpected characters into a box in a form on a web browser. These are legitimate issues, but let’s move on to something a bit more scary and less easy to remediate: When we have large systems constructed out of multiply sourced components, we must consider attack surfaces within the system as well as at the edge, at the user perimeter. These attack surfaces within the system exist because the level of trust that we confer on components from our various diverse sources will vary. That is, we have to contemplate the possibility of malicious code within the system. Perhaps that malicious code arrived with the system, or perhaps it is dynamically loaded code, plug-in code, and the like. Have we created a framework that allows for really useful dynamism, but also allows malware code to be loaded in from an adversary, unbeknownst to us? There are endless scenarios. My point is that we have to think about attack surfaces within the system.

In the old days, there was much focus on the concept of a security perimeter, and much cynical comment directing attention to the Maginot Line in France, and how easily it was pierced by an unexpected assault modality. It still makes sense to talk about firewalls, but only as part of a larger strategy. But in the early days there was an over-reliance on the firewall, starting with the enterprise firewall. The problem is that in large organizations there are a lot of people behind the firewall, and some of them may not have the best of intentions or may just make bad mistakes. So there was a shift in emphasis or rather an augmentation of focus to include the departmental firewall. And by similar reasoning we move ourselves from the departmental firewall to the desktop firewall.

But now we have to recognize that if you look at the inventory of software applications, and executables generally, on a particular host, we see an astonishing abundance. And some of those applications may not have the best characteristics with respect to trust. If there is a mix of sensitive applications, for example a portal to the corporate HR system, and some not-so-sensitive applications, for example your music sharing system, then we are motivated to revise our thinking away from the idea of a desktop firewall and put a firewall, in a sense, around each application.

All of this is worthy, but it isn’t enough. Many modern applications are themselves composed of multiple components from multiple suppliers. This impels us to think about those individual components. The moral of the story is that when we speak of the perimeter, from the standpoint of identifying attack surface and identifying mitigating actions, we really need to be thinking about the interior of our applications – the components and the APIs. These comprise the new perimeter.

By this argument, we need to engineer individual components within the system so they can protect themselves, to be safe and robust regardless of bad behavior of other components with whom they relate – their client and service provider components. This means having ways to assure code we develop and also having ways to evaluate code from other sources, even other teams in our organization. This creates important trade-offs. For example, the more we can develop assurance cases for that outside code, the less we need to fully insulate ourselves from all possible bad actions from that code. Really, we need to do both, and with an appropriate balance.

All of this leads us to start thinking much more deeply about our software engineering practices and how we can construct components for which it is possible to provide some evidence to support an assurance claim. When we produce such evidence, we can greatly facilitate the acceptance evaluation process, from a security standpoint, of a client of that component or service. This production of evidence can now become an opportunity for component developers to add value by differentiating on basis of their ability and ease to support an assurance case by their consumers.

All of this adds up to interesting and valuable changes in the rules of the road with respect to the engagements within these software supply chains.

CyLab Chronicles: Talk about "Evidence-based approaches." What it is? What are the issues involved? What do they offer?

SCHERLIS: It was many years ago that software engineers were starting to use iterative or spiral process models in the design and development of innovative or unprecedented systems. But for many years, old-fashioned waterfall thinking nonetheless persisted into the software assurance process. In this old model, the focus of a producer was on fully creating a software application, which, upon completion, would be cleanly handed off to a separate organization to undertake evaluation from a security standpoint. Often, there was some opacity in the code, perhaps in the interests of protecting trade secrets from evaluators, or perhaps just because that was what happened to work. The evaluators would then undertake a kind of reverse engineering process to assess the software with respect to various criteria related to security characteristics.

This waterfall-like model constrained software evaluators for many years and, in my view, held back the pace of progress in efficiently delivering complex highly assured software applications. One of the biggest challenges is that it doesn’t really create a meaningful incentive structure for software developers, whether they are the architects, the team leaders, the line-of-code developers, or the testers. It also put the evaluators in the difficult position of reverse engineering often highly complex code from the standpoint of functionality, security, and the many other attributes, both shallow and deep, that contribute to these.

The new “evidence based” idea is a little different. It has the possibility of addressing this evaluation challenge in a way that creates a much better incentive for developers. The idea is to ask architects, designers, and code developers to produce not only a corpus of models and code, but also a body of evidence to go with that corpus to support various specific assurance claims.

The great feature of this model is that the developers have a role in deciding what is to be the nature of that evidence. This creates encouragement to develop architectures, to select tools and languages, and adopt and adapt practices that make it easier to produce that body of evidence – because better evidence makes it easier for the client to validate it, which means you are better off in terms of succeeding in your business environment. To put it in the simplest terms, the developer’s development practices are influenced by assurance considerations. The incentives are more aligned, and the feedback loop is much tighter. These are two of the keys.

A third key is the incentive created on both sides to make use of the most advanced techniques and tools to provide assurance, including IDEs, analysis tools, modeling notations, and the like. This is because the developers feel that they will face a trade-off between their ability to make system that can support a strong assurance claim, on the one hand, and their ability to produce a system with aggressive functionality and capability, on their other. There is always this trade-off. When such a trade-off exists we see a great opportunity to push the curve outward, so to speak, by improving practices, so that you can have more of both – more capability and more assurance. And more quickly. That’s the game.

This leads me to the key new ideas in software assurance – related to more aggressive tooling, better practices, and better language and infrastructure choices – to better support our ability to make these claims. And, by the way, these same advanced practices also greatly facilitate traditional acceptance evaluation, since they facilitate that reverse engineering – the creation and verification of models regarding critical behavioral and security attributes.

CyLab Chronicles: Talk to us about "Sound software analysis.” What is it? What are the issues involved? What do they offer?

SCHERLIS: The idea of software analysis is to directly examine software code using tools so we can reach some kind of conclusion, ideally about the universe of all executions of that code. We focus particularly on security and safety characteristics of code, but there are literally thousands of different kinds of characteristics or attributes that can be considered. Each has its own story with respect to the significance of the attribute, the nature of analysis techniques required, the ability of those techniques to give useful conclusions, and our ability to produce code that more readily submits to analysis and so for which we can create assurance.

In my project we are working on both static and dynamic analysis. With static analysis we write tools that mathematically inspect code with the goal of reaching strong conclusions about the behavior of that code. Static analysis techniques are so called because they do not actually run the code. There is also dynamic analysis, which does involve actually running the code. These tools aggregate data that is generated by runs of the code, and then analyze that data to draw various conclusions.

With respect to static analysis, generally speaking, we really want to get to the point where we have a conclusion about the code that applies to all possible executions of the code, and that therefore enables us to make some kind of strong claim about all executions of the code.

But the reality of static analysis is that we can’t always achieve this, and even when we can we can’t often do it at scale, and even when we do it at scale, we can’t often create tools that are usable by working professional developers. But sometimes we can, and I’ll say more about that in a minute.

In practice, we need to make a distinction between sound static analysis on the one hand, and unsound or heuristic static analysis on the other. Heuristic analysis is a little bit like eyeballing the code informally, often just looking at syntactic patterns in the code. It is sort of like a sniff test. It gives you a strong hint of what might be wrong. Many of the tools that are out there, including commercial and open source tools, are in this realm of heuristic static analysis. The big advantage of heuristic static analysis is that you can analyze a lot of code for a large number of distinct characteristics. If you look at the sum total of all the various characteristics of all the various tools that are out there, you are looking at probably close to a thousand different particular quality characteristics. Some of these are fairly shallow – for example they may focus on the “style” in which the code is written, the textual layout and the like, how readable it is, variable names, etc. But at the other extreme, we have some very deep and significant security-related properties for which heuristic analysis is truly valuable and for which the results are readily actionable.

The heuristic character of the tools means they may miss cases in the code where a problem actually exists, and they may also identify areas of code as possibly manifesting the problem where in fact the problem does not exist. That is, the tools can have both false negatives and false positives. A false negative is a defect in the code that the tool does not find. A false positive is a signal or a finding by the tool of a defect, when the code is ok in that particular respect. Depending upon the particular characteristic or attribute that the tool is looking at, you may find large numbers of these false positives, and also large numbers of false negatives. But nonetheless the tool can be useful because it is pointing you to something.

This technology has matured very nicely, both in the research world and in the commercial tools, and as a result the rates for both false positives and false negatives appear to be diminishing. The tools are also getting better with respect to scope – they’re covering more characteristics – and usability – they can be better integrated in team and enterprise software practices.

In our work, we have been focusing on here is what is generally called sound static analysis. In a sound analysis, as distinct from heuristic analysis, we do not produce false negatives. If there is a defect of a particular variety, our sound analysis will find it. I’ll note that a sound analysis may have false positives. The mathematics generally preclude the possibility of having it both ways. But in practice we don’t get many false positives. But the main point is to avoid false negatives, to not miss a diagnosis. We may occasionally over-diagnose but we will never miss a diagnosis.

In our work on sound static analysis, we focus particularly in an area where the conventional techniques for software assurance – testing and inspection – tend to fall down badly. This is the area of deficiencies and defects related to concurrency.

Concurrency is an area of long-standing attention for us. Concurrency is quite devious and devilish. Think of a large group of people, all of whom are working relatively independently of each other, but who need to tag up once in awhile. They are all working at their own pace, taking occasional breaks. So, generally speaking, if you give them the same task to do on multiple occasions, it is going to be done slightly differently every time. Think of it as having decks of cards that you shuffle – the events interleave with each other. Every time you shuffle the cards, unless you are a magician, you are going to come up with a slightly different interleaving. This is called nondeterminism.

If you look at modern software systems, you will find that pretty much all of the major systems are concurrent, that is to say that they have multiple threads of activity all going simultaneously, with sharing of data and sending of messages among those threads. There are lots of reasons for this multithreading. One reason is that it is easies to write complex software when concurrency is available as an abstraction. Think about writing a program where you have a thread of activity whose job is to do nothing but listen to the network, or to listen to the mouse or the keyboard. We rely on the systems infrastructure to interleave this activity with all the other threads doing other jobs. So it is a natural abstraction.

But there is another reason for concurrency, one that is increasingly compelling right now for even the most modest performance-sensitive systems on desktops. This reason derives from the modern processors, which deliver higher levels of performance by providing multiple cores, or multiple processors, that can compute simultaneously. In the old days, software guys got performance improvements for free – roughly, if you wait for 18 months, then your code will run twice as fast because processors were speeding up at that pace.

They are still speeding up at a good pace, but they are doing so by adding more processor cores to the chips. This means that the way to get access to that 2x improvement is to figure out how to re-organize your code so that it can be distributed among multiple processors.

This is like reorganizing a task from one person who is working really hard and fast, to a committee process involving multiple people, each of whom is productive and working simultaneously with the others. This works only if we can figure out how to structure the task so people are spending most of their time doing useful work, and not in committee meetings, or in coordinating, or in waiting for others to complete some prerequisite task. This is a difficult planning problem, especially when we realize that we can’t precisely schedule everything perfectly in advance, since we don’t know the inputs in advance, and we can’t always break tasks into nicely sized chunks that can be worked on concurrently.

So the reality we now face is that our routine desktop and server code needs to be multi-threaded if we want to keep moving up the performance curve. And so we are driven into ad hoc concurrency, both because it is an important and necessary abstraction, and also because it is the path to higher levels of performance.

This leads us to think specifically about the cybersecurity challenges that arise from concurrent code and concurrent activity generally. This is especially frustrating when we consider the traditional methods of gaining assurance, namely testing and inspection. If there’s a one-in-a-million chance that threads will interact in some bad way, perhaps corrupting data or creating a moment of vulnerability, then testing isn’t likely to find this case. If you are going to run test cases, you have a one in a million chance of finding it. But if this code runs every millisecond on each server machine, then you will have run it a million times in 20 minutes, which means you have a pretty good chance of getting zapped by it on a regular basis. If it is running on multiple servers in your data center than you could get whacked by it maybe every few seconds. And if each time the error manifests you get a little bit more state corruption, then you are going to be in a highly unpredictable, highly frustrating situation...

If you are an adversary and you can create conditions to put several threads into a deadlock, or to exploit data corrupted due to a race condition, then you may have a useful advantage. That’s because traditional testing and inspection may not easily eliminate the vulnerability. If you see potential for a race, then the idea is to just keep shuffling the cards, so to speak, until you get that favorable arrangement of circumstances. It is analogous to finding out that that 20 minutes or so spent wiggling the wrong key in the lock may, with good enough probability, open the door. Then perhaps you’ll decide to spend those 20 minutes.

Every time my team go out into the field, and we gone out quite a bit, working with vendors and users of very large software systems, the moment we say the word “concurrency,” our hosts say “We’ve got to share with you one of our horror stories ….” And then they tell us a horror story that typically goes like this – there is some intermittently occurring failure, once in awhile it just steps up and bites everybody and then disappears again, and the systems have to be shut down and re-started, or whatever. So the customer organization calls up the vendor, or the integrator, or whoever is responsible for their code, and says, “Hey, we have got this intermittent problem.” And the provider says, “Well, we’re really hoping you can replicate the problem – can you?” And the answer is just what you expect: “No, I just said it was kind of unpredictable and intermittent, it sort of just happens, sometimes here and sometimes there.” So the vendor is thinking, “What do we do now?” And the client is thinking exactly the same thing. Everybody is stymied.

Our project team has responded to such situations, often as follow-ups to our field trials. In several cases after our field trials, we have responded to the proverbial “3 a.m. phone call” from a development team. There were a few cases where they were working for weeks, all hands, trying to find one of these intermittent bugs without success. With analysis we were able either to catch the bug or to focus attention narrowly in the segment of code where the defect was hidden.

We became intrigued by the concurrency challenge for all these reasons – the cybersecurity requirements, the software quality challenges, and the difficulties with traditional methods based on testing and inspection. All these led us to thinking hard about software analysis not just for concurrency but also for other difficult “deep” properties. We have developed a portfolio of approaches, including sound static software analysis and also dynamic analysis and heuristic analysis. In our projects we are doing a lot more than concurrency, but it is interesting to highlight concurrency here because it is such a perfect storm of badness with respect to software assurance. We have been working on these approaches now for more than a decade, developing tools focused on sound static analysis for concurrency and transitioning them into practice.

I do want to say that it turns out that the same deep techniques we use for concurrency are also highly applicable for other challenging software attributes. One of these challenges comes from the area of software supply chain we discussed earlier on in this interview. The component oriented approaches in modern systems lead to large and complex internal APIs, such as frameworks, plugins, and custom client code. These APIs are typically very wide and complex, and there are all sorts of details regarding the particular rules of the road that one has to follow. When you come into the traffic rotary, who has the right of way, the guy in the rotary or the guy coming in? It turns out that the conventions vary from place to place. You have really got to know all the little details of those rules of the road in order to succeed and stay safe. And this is an area where developers already have enormous challenges being successful clients of these frameworks. It is also an area where the framework vendors see a trade-off between the ease with which their clients can develop new applications based on their framework and the degree to which they can provide high levels of capability within that framework. That is a tough trade-off.

We and others, particularly my colleague Jonathan Aldrich and his graduate students, are working to push out that trade-off curve in with regard to those frameworks by creating advanced tools that help client developers stay compliant with the rules of the road. It turns out that some of the fundamental analysis techniques that we have developed for concurrency are also highly applicable to documenting and assuring compliance with respect to important areas of API compliance for frameworks.

So we started with concurrency but we have broadened out to quite a range of various security and quality attributes that build on similar underlying techniques. And in many of these areas there are similar levels of frustration that have been experienced with existing assurance capabilities based only on testing and inspection.

CyLab Chronicles: Tell us about SureLogic. What has been the impact of your research in the commercial realm? What can you tell us about your real world experiences?

SCHERLIS: We have felt from the beginning of our project that if we are going to move forward in this area that we address, from the outset, those critical requirements that one way or another bear on the realism of what we can accomplish. There is a kind of standard story we hear from people in the commercial world, speaking of academic projects. They will say, “We have millions of lines of code, and it is messy, nasty stuff. You academic people write papers and derive much glory from well-contrived examples of maybe 50 lines. But we’ve got 50 million lines. Please, show us some evidence that this new-fangled technique of yours could potentially be applied to anything anywhere near the size and complexity and messiness of what we’ve got.”

That is a profound level of skepticism, and it is historically, profoundly merited. If we are going to create new capabilities for software assurance that we feel have the potential to have genuine impact, we have to address these concerns of realism from the very beginning. There is no escaping it.

Ok. I just want to circle back for a moment to the ISR. The two PhD programs in the ISR have a particular requirement for incoming students which is unlike most other PhD programs with which I am familiar. It is a requirement more similar to what you might see for an MBA program. We want all students entering these programs to have some solid industry experience or operational experience analogous to industry before they come into the program. This has been a tremendous boon in our programs to assuring realism in our choice of research problems and in our approaches to validation of potential scientific results. I was sitting in a project meeting sometime ago, and I was looking around the table, and realized I had three PhD students in the room, each of whom had more than a decade of full time experience, including senior leadership experience, that they were bringing to bear in identifying research challenges to take up and what kinds of field validation make sense for the science we were contemplating. And I was learning from these students, because they had this incredible wealth of experience.

When we were starting in this work, my PhD students said to me that if we did not take on two key issues at the outset, then we would never get anywhere with this project. First, we needed to address the issue of scale, and to do so from the very beginning, since scalability in modern systems also means composability – that is, taking results that you get for separate components and combining those into a result about the aggregate. Second, we needed to take on a set of issues that we put under the rubric of “adoptability” – this means that the tools we create and the practices that we develop must all be usable by motivated, working professional developers, without sending them back to school for PhDs. The tools and practices must be able to be integrated in with conventional team tools and processes, without radical adaptation. And they must yield fairly immediate benefit, which is to say within a few hours of starting an engagement.

So we took this on and we went to the field to validate our early prototypes. Through Cylab we met individuals in many vendor and major client organizations, and were able to organize field trials for the tools. We felt that it was essential in organizing these field trials to deliver immediate value back to our host organizations in exchange for committing a small developer team to the two to three days, typically, that they would be working with us during our visit. We were trying to get useful field feedback, but frankly we also hoped those developers would value this capability and want the tools to stay with them. This meant that we had to make those two to three days fully worthwhile, giving them something that they otherwise would have had to have worked much harder to get. In fact, we developed a criterion that was even more rigorous. When I talked about this with my team those industry-savvy grad students said, “Actually, Bill, you need to give them something useful before lunch on the first day of the visit. The reason is that otherwise, while they are off at lunch, they are going to get an emergency phone call, or remember some deadline or some critical problem that needs to be addressed. And so, for one reason or another, they will not come back from lunch. Let’s be real. These are busy developers. Nobody is giving them a break on their deadlines for hanging out with us. It was probably senior management who motivated them to hang out with us, and developers may not feel all that confident in the technical judgments of their senior management.”

So we took on that challenge, what we call the “Before Lunch Test.” We wanted to run our tools on the most challenging production code shared by our hosts, and we wanted to show them things about that code that they would otherwise have no easy way to learn. And those things that we show them have to be genuinely useful and actionable from the standpoint of making repairs and fixes. I’ll say one more thing about these experiences. Because my team all had experience as developers they could engage with our hosts in a way that was structured respectfully and purposefully. Everybody’s code has bugs, especially the concurrent code. Nobody wants to embarrass anybody. That is the reality, and we can work together to make old code better and to make better new code.

Since those days we have run quite a number of field trials, including many repeat visits. So far, we have a clean sweep on the “Before Lunch” test. And so after a number of thesSureLogic logoe field trials it became evident that there was demand enough for us to do a spin-off, SureLogic. This spin-off was directly enabled by our experience with CyLab partners, and it has evolved to provide advanced tools and support well beyond what is possible or suitable for an academic research team. Our academic team must always moving on to the next set of challenges and ideas. This is why the technology transition process is an important step when we see potential market demand for emerging new capabilities. In our lab we’re working on a new set of problems. Please come visit us.


See all CyLab Chronicles articles