Mayhem - a machine that can find vulnerabilities in programs and fix them.

The program won first place in the DARPA Cyber Grand Challenge competition, dedicated to the automation of ethical hacking

In 2011, when investor Mark Andriessen said that “programs are eating the world,” this idea was fresh. Now it is obvious that software penetrates all aspects of our lives. From sophisticated electronic devices like medical equipment and robotic mobiles to the simplest ones, like light bulbs connected to the Internet and thermometers - we are surrounded by software.

This means that we are vulnerable to attacks on this software like never before.

Each year, 111 billion lines of code are added to existing programs, and each of them serves as a new potential target. Steve Morgan, founder and editor-in-chief of research firm Cybersecurity Ventures, predicts that system hacks committed through previously unknown vulnerabilities - what the industry calls “zero-day vulnerability” - will happen in the US every day by 2021, although in 2015 they occurred weekly .
')
To solve this problem, I and my colleagues at Carnegie Mellon University spent almost 10 years creating a technology that could automatically make the software secure. Then in 2012 we founded the company ForAllSecure to present our product to the world. We only had to prove that we can do what we declare, and we did it in the form of participation in a competition.

The Magnificent Seven: competing computers glow in front of an audience of the finals of the Cyber Grand Challenge competition held in Las Vegas in 2016.

Fast forward to 2016: our group has accumulated in the toilet of a hotel in Las Vegas, and bites its nails, being sure that it has lost a competition for which several thousand hours were preparing. It was the DARPA Cyber Grand Challenge (CGC) competition, one of many events (such as the robot competition held in the early 2000s) organized by the Department of Defense's promising research projects to stimulate technological breakthroughs to ensure national security. CGC appeared due to the fact that DARPA acknowledged that for the United States one day may come a day when they do not have enough people or tools to repel cyber attacks.

On the battlefield for cybersecurity, there are tech-savvy hackers, and among the best of them are those who are able to creatively use the weak points in the software to penetrate through the protection of the organization. The criminals who do this for personal use are called black-het hackers, and often they create tools with which script-kiddies, people who use ready-made programs, can create havoc, as, for example, happened in 2016, when Internet botnets Things launched a massive attack all over the Internet, gaining access to surveillance cameras in homes from ordinary people. Conversely, ethical, or white-hat hackers use their skills to prevent such attacks. But ethical hackers are not enough to protect all software, the number of which in the commercial world is rapidly increasing, not to mention the general infrastructure and military platforms vital for national and global security.

In 2014, DARPA announced the Cyber Grand Challenge contest - a two-year project to test the possibility of developing AI systems that can find, test and patch software weaknesses. In 2015, a hundred teams entered the prequalification phase. In 2016, seven of them reached the final, where they had to present a system that could reason - one that would not only notice the problem, but also make an assumption about its origin. The champion receives $ 2 million, and the second and third places - $ 1 million and $ 750,000.

After DARPA released these details, my colleagues and I immediately realized that this would be a great opportunity to demonstrate that the automatic cyber security system we developed is not a simple theoretical toy. When we talked about ForAllSecure, we were always faced with skepticism about the practicality of our solution. We decided that we needed to win the competition, considering that we had worked on it for a whole decade.

Our research at the university began with a simple assumption: people need a way to check the software they buy to ensure its security. Programmers, of course, will make every effort to eliminate security problems, but their main tasks are simpler: they need to release the product on time and ensure that he does what he has to. The problem is that hackers will find ways to get software to do what it shouldn't.

Today, best practices in the field of software security involve the use of special tools to review the source code and highlight potential weaknesses. This process produces a lot of false-positive results - marks the places that are not really weak - so a person should go through them all and check each of them. To improve the speed of detecting errors, some companies rely on ethical hackers who carry out a one-time analysis or participate in reward search programs, and pay them for the number and severity of bugs found. But only the most successful companies can afford to check the programs of the highest quality. And the problem is only complicated when the final product includes various components with open source and other work of third parties.

We presented the Mayhem system, which automates the work of ethical hackers. She not only pointed out possible weaknesses, she exploited these vulnerabilities, proving their weakness. This was also a key point at CGC - demonstrating evidence of vulnerability and creating a working exploit was one of the ways to get victory points. And since Mayhem was a machine that can be scaled, such an analysis can proceed at speeds inaccessible to humans.

Mayhem, like her six rivals, required water cooling. But statistics on consumption and temperature showed that Mayhem worked more actively than others.

We approached the creation of Mayhem, having a ready-made software analysis system, developed by us at the university, and based on a formal analysis of the program text. This method can be compared with a mathematical formula describing each path a program can follow, that is, issuing a constantly forking analytic tree. Such a tree may quickly become too large to work with, but we have come up with tricky ways of eliminating some paths that reduce the whole tree to several branches. After that we can study the remaining branches in more detail.

Symbolic execution creates an equation denoting the entire logic of the program — for example, x + 5 = 7 — and then solves this equation. Compare this strategy with another software analysis method, fuzzing , when you try to feed random values to the program in an attempt to drop it, after which you can already identify the vulnerabilities that led to a halt and how they could be used in a deliberate attack. Fuzzing enters random data until they make the value of the equation true, determining that x = 2.

Both approaches have their strengths, but for many years fuzzing had the advantage of ease of implementation and processing speed of the input data. Character execution concealed a huge and untapped potential for anyone who could tame it. In the Mayhem system, which we began to create in 2010, we were able to achieve this by combining both approaches.

Fuzzing is trying at great speed to make informed assumptions about what kind of input data can make the program behave in a new way, and then track which data actually leads to such a result. Symbolic execution is an attempt to ask a mathematician to formally deduce exactly which input data can help create an exploit. We found that it is better to find some errors through quick-guessing, and others with a mathematical approach. Therefore, we decided to use both methods in parallel. Symbolic execution goes deep into the sequential study of one part of the program, and outputs input data that runs this part of the code. Then the system transfers this data to the program for fuzzing, which starts to thresh very quickly on this section of the code, shaking out vulnerability from it.

Another ability of Mayhem is to work directly with binary code, unlike people who study text files, that is, source code. This means that the system can analyze the program without the help of the person who developed it, and this is very important in the case of programs that include components written by third-party developers whose source code can no longer be found. But it is difficult to make judgments about a binary code, because, unlike the original one, it has neither functions, nor logical variables, nor data abstractions. The binary code has only one memory and fixed-length bit vectors - an efficient data storage structure. To work with this code, you need to be a machine, and to create a machine that can work with such restrictions, the engineers really had to work a lot.

After determining the vulnerability, Mayhem generates a working exploit - such a code that a black hat hacker could use to crack. The goal is to demonstrate that the exploit can be used to gain privileged access to the operating system. As a result, Mayhem identifies vulnerabilities with absolute certainty, rather than simply pointing out possible problems, as most code analysis tools do.

In 2014, we experienced Mayhem on all programs included in the Debian distribution, the popular version of Linux that is used all over the world for workstations and servers. Mayhem found almost 14,000 unique vulnerabilities, and then narrowed this list to 250 new ones that deserved the highest priority. It took a week to test everything, because we scaled Mayhem on many servers in the Amazon cloud, and hardly required human intervention. We sent the most important finds to the Debian community. One of the reasons for our decision to turn a research project into a commercial company was our desire to collaborate on such a scale with developers, analyzing thousands of programs in search of a huge number of vulnerabilities.

The Mayhem project team: engineers from ForAllSecure are posing against the background of their brainchild at the closing ceremony. The author of the article, David Brumley, is in the first row, third from the left.

On June 3, 2015, over a hundred participants were admitted to the qualifying round, and they were given 131 unique tasks, each of which contained known vulnerabilities. Seven teams that scored the most points (issued for finding vulnerabilities and patches to them) got to the Cyber Grand Challenge final. ForAllSecure more than doubled the achievement of the participant who took second place in points. And this brief moment of joy quickly gave way to the realization that real pressure begins only now!

The task of creating a fully autonomous cybernetic decision-making system based on the Mayhem technology turned out to be a very complex enterprise. In particular, we managed to do this thanks to the fact that DARPA allocated enough finances to seven finalists in order to support a whole year of development. Among the main components of the technology are a set of tools that translate executable programs into a relatively easy-to-understand and analyze language, as well as active tools for finding and exploiting vulnerabilities, protective tools for automatically creating patches for defective binary code, and a program for effective coordination of work.

In preparation for the final, we faced two serious difficulties. First, although we were pleased with how Mayhem copes with the search for vulnerabilities, it seemed to us that the patches would not be effective enough. In competition, as in reality, there is no point in adding a patch that will devour more CPU time than the solution to this problem costs. Therefore, we have been working on a system for adding automatic patches for a very long time, increasing consumption by no more than 5%.

Secondly, we needed a winning strategy. Suppose we found a vulnerability and made a patch for it. Perhaps you should not immediately roll it if it slows down the program too much. Sometimes it is worth waiting and applying the patch only as a last resort. We have developed an expert system that takes a decision on when to patch the program.

When our team entered the Las Vegas ballroom, where the final was held on August 5, 2016, we saw seven huge stands with flashing lights located on a huge stage, under which was a reservoir with 180 tons of water cooling the participants' computers. Participants had to prepare the car the night before the start of the competition, and then DARPA selected all access to them. Machines were cut off from the Internet and in general from the outside world. We could only monitor how Mayhem works, monitor the power consumption and temperature of the system, statistics for which were displayed next to each rack. Mayhem constantly worked more actively than rivals - and we hoped that it was a good sign.

For nearly 100 rounds, the competing systems received new programs, and had to analyze the code for vulnerabilities and issue the necessary patches to protect in just a few minutes. Points in each round were given for the ability of the machine to find and prove the existence of vulnerabilities, as well as for the effectiveness of patches.

Victory by a wide margin: Mayhem managed to create a big margin, then fell after the 40th round. But the team was not aware of this gap until the very end of the competition.

To make the presentation more interesting, the organizers decided to announce the final points only at the very end of the competition. This meant that in the process we did not know whether we were winning or losing, we only saw Mayhem send messages about the vulnerabilities found. However, a few hours after the start of the competition, after the 40th round, it became clear to us that Mayhem had stopped sending messages. The program has dropped.

Inside, everything sank, because it seemed to us that our worst nightmare was becoming a reality. We asked the organizers to restart the program, but they did not allow us to do this. And since only half of the competition was completed, we began to prepare for the disgraceful defeat.

The organizers began to comment on the situation at the end of the competition, and beautiful visualizations showed us how the machines of each team found and corrected security problems in the programs in seconds, compared with the months or years that real people would spend on it. More than 5,000 people were present in the hall, and the guest commentators — the astrophysicist and the most famous hackers — fueled their interest. We have prepared for them to announce our defeat and confirm it with information on the screen.

However, watching the points that were added after each round, we suddenly realized that the initial gap between Mayhem was enough to keep the first place, although he finished working after the 40th round. After the announcement of the results of the last round, we felt as if the mountain had fallen from its shoulders. We won.

Mike Walker, director of the DARPA program, said that this demonstration of autonomous cyber defense was "only the beginning of a revolution" in the software security world. He compared the results with the first flights of the Wright brothers, which had not flown far, but indicated the way to transcontinental flights.

So far, ForAllSecure sells the first versions of its new service to early customers, including the US government and companies operating in high-tech and aerospace fields. At this stage, the service basically determines the presence of problems, which are then corrected by human experts. For quite a while, systems like Mayhem will work together with people security experts, making the software world safer. But we believe that in the distant future, machine intelligence will be able to cope with this task on its own.

Source: https://habr.com/ru/post/442178/

All Articles

Mayhem - a machine that can find vulnerabilities in programs and fix them.

The program won first place in the DARPA Cyber ​​Grand Challenge competition, dedicated to the automation of ethical hacking

More articles:

The program won first place in the DARPA Cyber Grand Challenge competition, dedicated to the automation of ethical hacking