📜 ⬆️ ⬇️

Using large pages in memory in PHP 7

Pagination is a way to manage memory allocated for user processes. All memory accesses are virtual, and the transformation of their addresses into physical memory addresses is performed by the OS and the hardware MMU .

When paginated, memory is divided into blocks of fixed size. On Linux on x86 / 64 platforms, the page size is usually 4 Kb. Each process contains a table in which information is stored on the correspondence between page addresses and physical memory — an element of the page table (page table entry). So that the OS does not climb into this table every time memory is accessed (otherwise, you will need to refer to it twice to process each memory request), a small cache of the associative translation buffer (TLB) is used. This hardware component is in the MMU and works extremely quickly and efficiently. The system scans the TLB to find a record of the correspondence between page addresses and physical memory. If the necessary record does not appear there, then the OS kernel has to access the memory, search for the necessary correspondence and update the information in the TLB to get the necessary data from the memory.

If you want to learn more about managing virtual memory, you can study this publication . In the meantime, let's look at how PHP 7 works with large pages (Huge Page).

Why do we need large pages?


It's simple: the larger the page size, the more data you can put into it. This means that the OS kernel gets access to a larger amount of data in one memory access. It also reduces the likelihood of a slip in the TLB, because each record now "covers" more data. Starting from version 2.6.20, the Linux kernel has been able to work with large pages (more on this: one , two , three ). A large page is usually 512 times the standard: 2 MB instead of 4 KB. Most often, the kernel performs the so-called transparent huge page mapping: virtual memory is divided into standard pages of 4 KB, but sometimes a group of consecutive pages are combined into one large one. This is usually used when working with an array that occupies a huge address space. But be careful: this memory can be returned to the operating system in small chunks, which will lead to the loss of a huge page size, and the kernel will have to roll back the merge procedure, again allocating 512 pages of 4 KB each.
')
The user process itself can initiate the merge procedure. If you are sure that you can fill the entire large page with data, then it is better to ask the kernel about its allocation. The presence of large pages facilitates memory management, because the kernel has to look at fewer page table elements. In addition, the number of entries in the TLB decreases, and the system as a whole will work more efficiently and faster.

OPCache to help you


Working on PHP 7, we spent a lot of effort on more efficient work with memory. The critical internal structures in PHP 7 have been rewritten to make better use of the CPU cache. In particular, spatial locality is improved, therefore more single-connected data is placed in the cache, and the engine rarely accesses memory. The OPCache extension now has more options for working with large pages.

Request for selection of large pages


In the Unix world, there are two APIs for working with virtual memory allocation. It is preferable to use the mmap () function, since it really allows you to select large pages. There is also a madvise () function that only gives hints (recommendations) to the kernel regarding the conversion of a part of memory into a large page, but there are no guarantees.

Before requesting the selection of a large page, you need to make sure that:


Using sysctl, you need to configure vm.nr_hugepages, and then check the availability of large pages using cat / proc / meminfo:

> cat /proc/meminfo HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB 

In this example, 20 large pages of 2 MB are available. Linux on the x86 / 64 platform can work with pages up to 1 GB, although this size is not recommended for PHP, unlike the DBMS, where gain from large sizes is possible.

Further it is possible to use API. To allocate part of the memory for a large page, you need to make sure that the boundaries of the address space coincide with the boundaries of a large page. In any case, this should be done to improve the efficiency of the CPU. After that, you can request a page selection from the kernel. In the following example, the alignment of addresses will be done using the C language, and the buffer for this task is taken from the heap. For the sake of cross-platform compatibility, we will not use existing functions for alignment, like posix_memalign ().

 #include <stdio.h> #include <sys/mman.h> #include <string.h> #include <stdlib.h> #define ALIGN 1024*1024*2 /* We assume huge pages are 2Mb */ #define SIZE 1024*1024*32 /* Let's allocate 32Mb */ int main(int argc, char *argv[]) { void *addr; void *buf = NULL; void *aligned_buf; /* As we're gonna align on 2Mb, we need to allocate 34Mb if we want to be sure we can use huge pages on 32Mb total */ buf = malloc(SIZE + ALIGN); if (!buf) { perror("Could not allocate memory"); exit(1); } printf("buf is at: %p\n", buf); */ Align on ALIGN boundary */ aligned_buf = (void *) ( ((unsigned long)buf + ALIGN - 1) & ~(ALIGN -1) ); printf("aligned buf: %p\n", aligned_buf); /* Turn the address to huge page backed address, using MAP_HUGETLB */ addr = mmap(aligned_buf, SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | MAP_FIXED, -1, 0); if (addr == MAP_FAILED) { printf("failed mapping, check address or huge page support\n"); exit(0); } printf("mmapped: %p with huge page usage\n", addr); return 0; } 

If you are familiar with the C language, then there is nothing special to explain. The memory is not explicitly released, since the execution of the application will end anyway, and this example is only needed to illustrate the idea.

When the process has allocated the memory and is almost complete, you can observe the very large pages reserved by the kernel:

 HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 16 HugePages_Surp: 0 Hugepagesize: 2048 kB 

Reserved because the page will not be stored in virtual memory until you write data to it. Here, 16 pages are marked as reserved. 16 x 2 MB = 32 MB - we can use this amount of memory to create a large page using mmap ().

Placing a PHP 7 code segment in a large page


The size of the PHP 7 code segment is quite large. On my LP64 x86 / 64-machine it is about 9 MB (debug build):

 > cat /proc/8435/maps 00400000-00db8000 r-xp 00000000 08:01 4196579 /home/julien.pauli/php70/nzts/bin/php /* text segment */ 00fb8000-01056000 rw-p 009b8000 08:01 4196579 /home/julien.pauli/php70/nzts/bin/php 01056000-01073000 rw-p 00000000 00:00 0 02bd0000-02ce8000 rw-p 00000000 00:00 0 [heap] ... ... ... 

In this example, the text segment occupies a chunk of memory from 00400000 to 00db8000 . That is, the total amount of binary PHP machine code is more than 9 MB. Yes, PHP is evolving, overgrown with functions, and contains more and more C code converted to native code.

Consider the properties of our memory segment. It is highlighted using traditional 4 KB pages:

 > cat /proc/8435/smaps 00400000-00db8000 r-xp 00000000 08:01 4196579 /home/julien.pauli/php70/nzts/bin/php Size: 9952 kB /* VM size */ Rss: 1276 kB /* PM busy load */ Pss: 1276 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 1276 kB Private_Dirty: 0 kB Referenced: 1276 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB /* page size is 4Kb */ MMUPageSize: 4 kB Locked: 0 kB 

The kernel did not use a transparent selection of a large page for this segment. Perhaps it will resort to this later, as the process continues to be used with pid8435. We will not go into the issues of managing the kernel with large pages, but with the help of OPCache we can redistribute our segment into a large page.

The use of large pages in this case is advisable, since the code segment does not change in size and does not move at the end of the process. Our 9,952 Kb can be placed in four pages of 2 Mb each, and the rest can be dispersed into regular pages of 4 Kb.

Distribution of the code segment on large pages


If a:


 static void accel_move_code_to_huge_pages(void) { FILE *f; long unsigned int huge_page_size = 2 * 1024 * 1024; f = fopen("/proc/self/maps", "r"); if (f) { long unsigned int start, end, offset, inode; char perm[5], dev[6], name[MAXPATHLEN]; int ret; ret = fscanf(f, "%lx-%lx %4s %lx %5s %ld %s\n", &start, &end, perm, &offset, dev, &inode, name); if (ret == 7 && perm[0] == 'r' && perm[1] == '-' && perm[2] == 'x' && name[0] == '/') { long unsigned int seg_start = ZEND_MM_ALIGNED_SIZE_EX(start, huge_page_size); long unsigned int seg_end = (end & ~(huge_page_size-1L)); if (seg_end > seg_start) { zend_accel_error(ACCEL_LOG_DEBUG, "remap to huge page %lx-%lx %s \n", seg_start, seg_end, name); accel_remap_huge_pages((void*)seg_start, seg_end - seg_start, name, offset + seg_start - start); } } fclose(f); } } 

OPCache opens / proc / self / maps and searches for a code memory segment. There is no other way to do this, since access to such information cannot be obtained without explicitly using kernel dependencies. Today, procfs is used in all Unix systems.

We scan the file, find the code segment, align the borders according to the address space of a large page. Then we call accel_remap_huge_pages () indicating the aligned borders.

 # if defined(MAP_HUGETLB) || defined(MADV_HUGEPAGE) static int accel_remap_huge_pages(void *start, size_t size, const char *name, size_t offset) { void *ret = MAP_FAILED; void *mem; mem = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mem == MAP_FAILED) { return -1; } memcpy(mem, start, size); # ifdef MAP_HUGETLB ret = mmap(start, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED | MAP_HUGETLB, -1, 0); # endif # ifdef MADV_HUGEPAGE if (ret == MAP_FAILED) { ret = mmap(start, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); if (-1 == madvise(start, size, MADV_HUGEPAGE)) { munmap(mem, size); return -1; } } # endif if (ret == start) { memcpy(start, mem, size); mprotect(start, size, PROT_READ | PROT_EXEC); } munmap(mem, size); return (ret == start) ? 0 : -1; } #endif 

Everything is quite simple. We created a new temporary buffer (mem), copied the data into it, then using mmap () tried to distribute the aligned buffer across large pages. If the attempt was unsuccessful, you can prompt the kernel using madvise (). After the distribution of the segment on the pages, copy the data back and return.

 00400000-00c00000 r-xp 00000000 00:0b 1008956 /anon_hugepage Size: 8192 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Locked: 0 kB 00c00000-00db8000 r-xp 00800000 08:01 4196579 /home/julien.pauli/php70/nzts/bin/php Size: 1760 kB Rss: 224 kB Pss: 224 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 224 kB Private_Dirty: 0 kB Referenced: 224 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB 

8 MB distributed over four large pages, and 1760 KB - on the standard. It gave me a performance boost of Zend at 3% with heavy loads.

When using large pages:


Conclusion


Now it’s clear how the OPCache extension for PHP 7 helps improve system performance when using the now common memory management technique known as “big pages.”

By the way, a number of DBMS (for example, Oracle , PostgreSQL ) have been using the advantages of large pages for several years.

Source: https://habr.com/ru/post/270685/


All Articles