📜 ⬆️ ⬇️

Meltdown and Specter for the cloud: our risk assessment and how we patched



The new year began in a very original way. Instead of family gatherings, the maintenance service carefully monitored the situation with the vulnerabilities of the Meltdown and Specter processors. In theory, they meant a threat to customer data and keys. In short, the implementation of vulnerabilities looks like this:

- Do you have Aksu for sale?
- No.
- And KPVT?
- No.
- And the grenades?
- Ehh, that's what is not, that is not.
')
That is, you can build a query system that indirectly makes it clear what is stored in the RAM of a physical host by measuring the response time of the processor. In the first half of January, manufacturers of OS and hypervisors rolled out patches that do not allow to use this feature, but at the same time cut some of the performance of the systems.

We were very worried about the DBMS, because it was on them that the peak of syscalls was expected, and the consumption of cloud resources could grow by more than 10%.

Looking ahead a little bit - with MS SQL patches in some tests it works for some reason faster.

Performance tests


Since we don’t roll anything to the Technoserv Cloud , we prepared for these patches fully armed - we made two test environments for which we installed ESXi and WinServer 2012 - the old ones and, as soon as the updated ones were released, they were updated accordingly.

The test results turned out strange. See, here is the testing methodology:

START CONFIGS AND CONDITIONS:
Updated system
Server:
Dell PowerEdge R630
2 x Intel Xeon E5-2690 v4 2.6 GHz
320 GB RAM
UEFI v2.7.0

Hypervisor: VMware ESXi 6.0 Build 7504637

Virtual machine:
VM version 11
8 vCPU (1 socket)
24 GB RAM
Disk 0 (System) Thick provision Eager Zeroed - PVSCSI Controller 0 on Dell SC9000 SSD + HDD Profile
Disk 1 (DB Files) Thick provision Eager Zeroed - PVSCSI Controller 1 on Dell SC9000 SSD Profile
Disk 2 (Transaction Log Files) Thick provision Eager - PVSCSI Controller 2 on Dell SC9000 SSD Profile
NIC: VMXNet3
OS: Microsoft Windows Server 2012 R2 201-01-01 Monthly Rollup (KB4056895)
DB: Microsoft SQL Server 2016 Enterprise Editions with SP1

Tests:
7-Zip v16.04 64-Bit (4 CPU Threads / 192 MB Dictionary)
HammerDB v2.23 (32 Warehouses, 8 vUsers, Rampup time 2 min, Test Duration 10 min)

Non-updated system
Server:
Dell PowerEdge R630
2 x Intel Xeon E5-2690 v4 2.6 GHz
320 GB RAM
UEFI v2.6.0

Hypervisor: VMware ESXi 6.0 Build 5572656

Virtual machine:
VM version 11
8 vCPU (1 socket)
24 GB RAM
Disk 0 (System) Thick provision Eager Zeroed - PVSCSI Controller 0 on Dell SC9000 SSD + HDD Profile
Disk 1 (DB Files) Thick provision Eager Zeroed - PVSCSI Controller 1 on Dell SC9000 SSD Profile
Disk 2 (Transaction Log Files) Thick provision Eager - PVSCSI Controller 2 on Dell SC9000 SSD Profile
NIC: VMXNet3
OS: Microsoft Windows Server 2012 R2 2017-12 Monthly Rollup (KB4054519)
DB: Microsoft SQL Server 2016 Enterprise Editions with SP1

Tests:
7-Zip v16.04 64-Bit (4 CPU Threads / 192 MB Dictionary)
HammerDB v2.23 (32 Warehouses, 8 vUsers, Rampup time 2 min, Test Duration 10 min)

Here are the results:

patched


unpatched


patched


unpatched


patched


unpatched


In a heavier test, the updated system shows slightly lower performance:

HammerDB v2.23 (Warehouses: 128; vUsers: 24; Total Transactions per User: 1000000; Rampup time: 2 min; Test Duration: 10 min; User Delay: 500ms; Repeat Delay: 500 ms)

unpatched


unpatched


patched


patched


SQL - in some tests, the results of the patched system are higher (faster) than in the unpatched system. It is very strange. I cannot explain this phenomenon - perhaps the fact is that with a cumulative update something else has come that optimizes the work. Yes, we gave a synthetic load, but still quite similar to the one that is in production, so it is hardly the case in the research method itself.

For more complex tests, the performance drawdown is noticeable, but it did not exceed 10%.

Threat assessment


On the patched hypervisor, it doesn’t matter what guest operating system (patched or not) the cloud client is using. That is, after the updates (and they have already been done, it took a total of 4 days for the entire infrastructure - so that without downtime, carefully migrating the VM back and forth) - the cloud is protected from Specter and Metldown.

We didn’t find any signs of exploits that were in effect until the protection was established.

A detailed vulnerability assessment suggests that there is almost no reason to use it in large clouds - it is just very hard to make a targeted attack. Yes, an attacker could deploy his process, polling the memory of the kernel or someone else's process, but at the same time only on the same physical host where his virtual machine is located and someone else's. That is, it is necessary:

  1. To know that the goal is in this cloud.
  2. Drag malicious process that is not detected by stream protection.
  3. Deploy it on the same physical host as the target.

The latter is extremely unlikely if you do not buy at least half of the resources of the public cloud. VMWare DRS, though fulfills, but moves a small part of the VM.

Infrastructure services operate in a separate resource cluster, i.e. we have a physical separation of hosts, where client virtual machines are spinning, and where infrastructure virtual machines are.

Those who are in private clouds were not worried about anything at all. They have a shared storage system with other systems (and nothing can read from a disk vulnerability) and their own set of hosts.



When you need to scale - add clean hosts.

Bottom line - we are now closed from the main threat and we recommend customers patch their OS. If only because keeping the OS up to date is a good form. Vulnerability after detailed consideration seems blatant, but not as dangerous and troubling as it seemed in the first days.

This is the material of the head of the cloud operation service of Technoserv Cloud Dmitry Maximov

Source: https://habr.com/ru/post/346524/


All Articles