As a mistake Specter, capable of breaking the industry, kept secret for seven months

When researcher Michael Schwartz from Graz Technical University first contacted Intel, he thought he would upset her. He found the problem in their chips, working together with colleagues - Daniel Grass, Moritz Leap and Stefan Manghard helped him. The vulnerability was deep and easily used. His team finished writing an exploit on December 3rd, Sunday afternoon. Assessing the possible consequences of their findings, they immediately wrote to Intel.

Schwartz received the answer only in nine days. But when they called him from the company, Schwartz was surprised: the company already knew about problems with the CPU, and was desperately trying to figure out how to fix them. Moreover, the company did everything possible to ensure that no one else knew about it. They thanked Schwartz for his contribution, but said that the information he had discovered was completely secret, and gave him a date after which this secret could be revealed.

The problem that Schwartz discovered — and, as he later learned, many more — was potentially catastrophic. Vulnerability at the level of the chip circuit, which could slow down the work of any processor in the world, in the absence of a perfect fix, except for processing the entire chip. It impressed almost all the major technocompanies in the world, from Amazon server farms to chip makers like Intel and ARM. But Schwartz ran into another problem: how to keep secret such a serious vulnerability long enough to be fixed?
')
Disclosure is an old issue in the security world. When a researcher finds a bug, it is usually accepted to give manufacturers several months of the odds on correcting a problem before it becomes available to the general public and bad people have a chance to use it. But the more companies and products are exposed to the problems found, the more difficult this dance becomes. The more programs you need to quietly develop and promote, the more people are required to report the problem and ask to keep it secret. In the case of Meltdown and Specter, this coordination to keep secrets failed and the secret came out before anyone had time to prepare for it.

Premature disclosure has consequences. After the information is publicized, confusion ensues - for example, are AMD's chips susceptible to Specter attacks (are they susceptible) or is Meltdown characteristic only for Intel (AMD chips have also suffered). Antivirus systems were caught off guard, and inadvertently blocked many critical patches. Development of other patches had to be suspended after computers stopped working because of them. One of the best tools to fix a vulnerability, Retpoline, was developed by the Google incident response team, and they initially planned to release it along with information about the bug. But although the Retpoline development team claims that it was not caught off guard, the code for this tool was not shared until the day following when the vulnerability was first announced, in particular due to an accidental breach of secrecy.

What worries us most is that many of the critical groups that respond to vulnerabilities were not at all aware of what was happening. The most influential warning about existing vulnerabilities came from the Carnegie Mellon CERT unit, working with the Department of Homeland Security on vulnerability disclosure. But according to vulnerability analyst Wil Dorman, CERT was not aware of this problem until Meltdown and Specter were launched, which led to increased chaos. In the original report, replacing the CPU was listed as the only solution. Technically, this advice was correct in the case of an error in the processor circuit, but it only increased the panic among IT managers who imagined how they pick out and replace the CPU on all reporting devices. A few days later, Dorman and his colleagues decided that their advice was not applicable in practice, and they replaced the recommendation with a simple patch installation.

“I would like to know in advance,” says Dorman. “If we had known about this before, we would have been able to issue a more accurate document, and people would immediately get much more information, and not like now, when we check patches and update the document all last week.”

But perhaps these problems could not be avoided? Even Dorman is not so sure. “This is the largest multiple vulnerability we have ever dealt with,” he told me. “With a vulnerability of such magnitude, it is impossible to get off the water so that everyone is satisfied.”

The first step in uncovering the vulnerabilities of Meltdown and Specter was taken six months ago, before the opening of Schwartz, in a letter dated June 1 sent by Jan Horn, a member of the Google Project Zero project. A letter sent to Intel, AMD and ARM, signed a new vulnerability, called Specter, and demonstrated the exploit of Intel and AMD processors, and the unpleasant consequences for ARM. Horn approached this with caution and gave the manufacturers only the minimum necessary information. He specifically appealed to the three chip makers, and urged each company to figure out how to make the case public and contact other companies that could be affected by the situation. At the same time, Horn warned them not to spread the information too far too quickly.

“Please note that we have not yet reported this to other departments of Google,” Horn wrote. “When reporting this to third parties, try not to spread information unnecessarily.”

It turned out to be quite difficult to establish who is vulnerable. It all started with chip makers, but it soon became clear that it would be necessary to patch operating systems, which required the involvement of another group of researchers. This should affect browsers, as well as massive cloud services managed by Google, Microsoft and Amazon, which could be considered the most attractive targets for the new bug. As a result, dozens of companies from all over the world will have to release this or that patch.

The official policy of Project Zero was to provide 90 days before the publication of news, but the more companies joined the circle of favorites, the more Project Zero yielded to its requirements, and extended this period more than doubled. As the months went by, companies started releasing their own patches, trying to hide what they were fixing. The incident response team from Google received information in July, one month after the first warning from Project Zero. The Microsoft Insiders program released a quiet early patch in November. During this period, Intel director Brian Krzhanich committed more controversial actions, in October ordering the automatic sale of shares on November 29th. On December 14, Amazon Web Server clients received a warning that on January 5, a wave of computer restarts could affect performance. Another patch from Microsoft was compiled and released on New Year's Eve, which says that the company's team probably worked on it all night. In each case, the reasons for the changes were blurred, and users knew little about what was being corrected.

It is impossible, however, to rewrite the basics of the Internet infrastructure so that someone does not notice. The thickest hints came from the world of Linux. This OS, which runs most of the cloud servers on the Internet, is obliged to play a large role in any Specter and Meltdown bug fixes. But, since the source code of this system is open, any changes will have to show the public. Each update was laid out on an open Git repository, and all official discussions were held on a publicly accessible mailing list. When, for the mysterious function of “page table isolation,” patches for the OS kernel began to come out one by one, the people who were closely watching this realized that something was wrong.

The biggest hint was the December 8 event, when Linus Torvalds accepted a new patch that changed how the Linux kernel works with x86 processors. “This fix, in addition to fixing KASLR leaks), also reinforces the x86 code,” Torvalds explained. And the latest release of the kernel came out just the day before. Usually the patch had to wait for inclusion in the next release, but for some reason this patch was too important. Why does the usually whimsical Torvalds suddenly turn on the freelance update so simply, especially if it seems to slow down the core?

A letter of a month ago appeared suddenly even more strange, in which it was proposed to update the old kernels with a new patch in hindsight. Summarizing the rumors , on December 20th, Linux veteran Jonathan Corbet wrote that the problem with the page table "has all the hallmarks of a security patch released under deadline pressure."

And yet they did not know everything. Page Table Isolation, “isolating a page table” is a way to separate kernel space from user space, so the problem was clearly in some form of leakage from the kernel. But it remained unclear what exactly worked incorrectly in the core or how far the action of this bug spread.

The next news came from the chip makers themselves. The new Linux patch has described all x86 processors as vulnerable, including AMD processors. Since the patch lowered the speed, AMD was not happy with the inclusion of this patch. The day after Christmas [catholic, December 25 / approx. transl.] AMD engineer Tom Lendaki sent a letter to the mailing list on the Linux kernel, explaining why AMD's chips did not require a patch.

“The AMD microarchitecture does not allow to operate with such references to memory, including speculative ones, which gain access to privileged data, working in a less privileged mode, in cases where such access can lead to the“ page fault ”error, wrote Lendaki.

This whole story is full of technical terms, but for all people who tried to figure out the essence of the error, it sounded like a fire alarm. An AMD engineer knew exactly about the vulnerability, and said that the kernel problem stemmed from something that processors have been doing for almost 20 years. If the problem was speculative links, this problem concerned everyone - and fixing it would require much more than a simple fix of the kernel.

"That was the impetus," said Chris Williams, editor of The Register. - Until that moment, no one mentioned speculative references to memory. Only after the appearance of this letter, we realized that something was wrong. "

When it became clear that the problem was associated with speculative links, the researchers were able to complete the picture to the end. For years, security researchers have been searching for methods of hacking the kernel through speculative program execution; The Schwartz team from Graz published a paper on this matter in June. Anders Fogh published his attempts at similar attacks in July, although they were unsuccessful. Just two days after the letter from AMD, a researcher under the nickname brainsmoke presented a paper on this topic at the Chaos Computer Congress in Leipzig. All these works did not lead to the discovery of a bug suitable for use, but thanks to them it became clear how it should look - and it looked extremely bad.

Fogh said that from the very beginning it was clear that any working bug would turn out to be a disaster. “When you start to learn something like this, you already know that your success will lead to very bad consequences,” he told me. After the release of Meltdown and Specter and the outbreak of chaos, Fog decided not to publish further research on this topic.

The following week, rumors of a bug began to leak through Twitter, mailing lists and forums. A typical speed meter that has flown by on the PostgreSQL mailing list found a 17% slowdown in performance — a terrible number for people who were waiting for a patch. Other researchers wrote informal posts, describing everything that they know, and stressed that these are just rumors. “This article basically provides guesses until the embargo is lifted, ” one of the authors wrote . "And on this day we should expect fireworks and dramatic events."

By the New Year rumors became impossible to ignore. Williams decided it was time to write something. On January 2, The Register published an article about what they called "a flaw in the Intel processor circuitry." It described what was happening on the Linux mailing list, an ominous letter from AMD, and early research. "From what programmer Tom Lendaki of AMD described, it follows that the CPU from Intel is sinful by speculative code execution without security checks," the article said. - This allows a user-defined ring-3-level code to read data from the ring-0-level kernel level. This is not good. "

The decision to publish this article proved controversial. The industry assumed the existence of an embargo on the dissemination of information, giving companies time to release patches. Early news spread curtailed this time, and gave criminals a chance to exploit vulnerabilities before patches appeared. But Williams claims that by the time the article was published, the secret had already been revealed. “I thought we were obliged to warn people that when these patches come out, they need to be installed precisely,” says Williams. “If you are smart enough to use such a bug, you would have guessed it without us.”

And in any case, the embargo would have lasted only one more day. The official release was scheduled for January 9, at the same time as Microsoft’s Thursday patches and right at the height of the Consumer Electronics Show consumer electronics exhibition, which could mute the bad news. But the combination of wild rumors and available researchers made it impossible to contain the news. Reporters threw letters at the researchers, and everyone who had this attitude tried their best to keep silence, since the probability that the secret would last another week was constantly decreasing.

The turning point came thanks to brainsmoke. He was one of the few nuclear researchers who were not embargoed, so he took the rumors as a guide to action and decided to find this bug. The next morning after the article in The Register, he found it, and tweeted a screenshot of his terminal as evidence. “No page fault is needed,” he wrote in the following tweet. “The main question, apparently, is to drag everything into the cache and from the cache.”

When the researchers saw this tweet, everything started to turn. The team from Graz firmly decided not to reveal the maps before Google or Intel, but after distributing evidence of the possibility of using the bug from Google, it was reported that the embargo will be lifted on the same day, January 3, at 2 pm Pacific time. At the appointed hour, the full version of the study appeared on two specially prepared sites, along with pre-prepared logos of each of the vulnerabilities. Messages flowed in from ZDNet, Wired and The New York Times, often describing information gathered just a few hours before. After more than seven months of planning, the secret finally came out.

It is difficult to say how much the early exit cost. Patches are still being developed, and speed meters are still counting the total losses. Would it have passed more smoothly if there was another extra week for preparation? Or would she just draw the inevitable?

You can find a lot of formal documents describing how the announcement of such vulnerabilities should occur, for example, from an international standards organization , the US Department of Commerce , CERT - although they can find little clear advice about such a serious problem. Experts have been tormented for years with similar questions, and the most experienced of them have already despaired of finding the perfect answer.

Katie Moussouris helped Microsoft write instructions for such events, in conjunction with ISO standards and countless other instructions. When I asked her to evaluate the public reaction this week, she described her more gently than I expected.

“Perhaps it was better not to do anything,” she said. - ISO standards can tell you what to think about, but they will not tell you what to do at the height of this situation. This is similar to how you read instructions and perform a couple of training alarms. It's good when there is a plan, but when your house is on fire, you are not acting in the way it is written in the plan. ”

The more technology is centralized and overgrown with internal connections, the more difficult it becomes to avoid such fire alarms. With the proliferation of protocols like OpenSSL, the risk of massive bugs like Heartbleed, the Internet version of the disease of grain crops, increases. This week has demonstrated a similar effect on iron. Speculative execution became the industry standard before we had time to ensure its safety. And since most web services are running on the same chips and on the same cloud services, this risk increases many times over. And when the vulnerability finally manifested itself, as a result, the task of properly lighting it became almost impossible.

Such confusion is hard to avoid with any failure of key technologies. “In the 90s we had a motto - one vulnerability, one manufacturer, and there were most such vulnerabilities.And now, virtually anywhere, there is an element of coordinating several stakeholders, ”says Moussuris. “This is how real coverage of issues related to the work of several stakeholders looks like.”

Source: https://habr.com/ru/post/374171/

All Articles

As a mistake Specter, capable of breaking the industry, kept secret for seven months

More articles: