📜 ⬆️ ⬇️

PHP and realpath_cache

From the translator: understanding the other day with the error that occurred after the deployment of the service, I came across this wonderful article about the mechanism of caching file statuses in PHP. I offer the community a translation.

Have you heard about the realpath_cache_get() and realpath_cache_size() PHP functions? And maybe about the parameters realpath_cache_size and realpath_cache_ttl in php.ini?

The realpath cache is a fairly important PHP mechanism to keep in mind. Especially when you have to work with symbolic links, for example, when a project is deployed. Configuring realpath caching can significantly affect server performance and the load on the server's disk subsystem. This parameter was introduced in version 5.1 when the first PHP frameworks began to appear.
')
Next, we will understand how it all works under the hood, and how to live with it. Under a cat there are a lot of links to source codes.


Remembering the stat () system call


Do you know how your system works? Let me refresh your memory. When you work with a path , the system kernel and file system should understand what you want from them. When you use the path to access a file, your library or kernel must allow it. Path resolution is getting information about it: is it a file, a directory, or maybe a link?

One way to do this is to ask the system about the type of file. In case a link came across, find out about the target file. When you use relative paths (like " ../hey/./you/../foobar "), you must first get the absolute path, and only then get information about the final file.

Usually, the realpath () C function is used to resolve the relative path. It, in turn, makes the stat () system call.

The stat () call is quite heavy. First, it is a system call, entailing an interruption and switching context. Secondly, it works with data on a slow disk. In the code you can find calls to the inode-> getattr () file system. Usually, the kernel uses its own cache ( buffer-cache ), so the performance impact should be negligible. However, on a busy server, the cache may not contain the necessary information, which entails an increased load on the disk subsystem. Therefore, it is in our best interest to prevent such behavior.

What does PHP do?


Projects written in PHP are usually stored in a variety of files. Today we use tons of classes, meaning there are tons of files (because we use a file for each class). Regardless of whether we use the autoload mechanism (autoload) or not, we must include all these files in order to read the code inside them, and to do this, make a stat() call to get information about the file. Therefore, when we access the file from PHP, it first resolves the paths and links, then gets the file information via the stat() system call, and then saves the result to its own cache, called realpath cache .

PHP uses this cache only when running the realpath() function. All other information about the file like the owner, group, access rights and timestamps is stored in a separate cache - access cache . Let's take a look at the sources: when a file is accessed, the php_resolve_path () function is called . This function makes the tsrm_reapath () call, which internally executes virtual_file_ex () and tsrm_realpath_r () .

And in this place interesting things happen: functions like realpath_cache_find () are called to search for the data stored in the cache for the requested file. For the storage of information, the realpath_cache_bucket structure is used , which encapsulates a large data packet:

 typedef struct _realpath_cache_bucket { unsigned long key; char *path; int path_len; char *realpath; int realpath_len; int is_dir; time_t expires; #ifdef PHP_WIN32 unsigned char is_rvalid; unsigned char is_readable; unsigned char is_wvalid; unsigned char is_writable; #endif struct _realpath_cache_bucket *next; } realpath_cache_bucket; 

If no data is found in the cache, the php_sys_lstat () function is called , which is a proxy for the lastat () system call. The result of this call is stored in realpath cache.

PHP settings


So, from the PHP side, we need to know a few things about the realpath cache. First, the php.ini settings:
realpath_cache_size
realpath-cache-ttl

The documentation has a remark about the increase in these parameters on servers where the source code rarely changes. It is also worth considering that the standard cache size of 16KB is negligible. It will be exhausted by a single request with a framework like Symfony2. To keep the cache size setting up to date, you should monitor the output of the realpath_cache_get () function. If the available volume is quickly exhausted - this is a clear reason to increase the cache size up to 1MB. If the cache becomes full, PHP will start abusing stat () calls, which will directly affect performance. The required cache size is difficult to calculate with sufficient accuracy. Having rummaged in the source code , we can conclude that each entity in the cache takes the place equal to: `sizeof(realpath_cache_bucket) + - + 1`
For a 64-bit system (LP64) sizeof (realpath_cache_bucket) = 56 bytes.

There is another feature. PHP resolves every path it encounters while running, breaking it apart. If you request the file /home/julien/www/fooproject/app/web/entry.php , PHP will split it to the maximum number of available paths, starting from the root. Thus, it will first save /home , then /home/julien , then /home/julien/www , etc.

Why? For starters, this is required to verify access to each level of the path. Secondly, many users form paths using concatenation, so PHP can check the paths in parts, each time requesting an already cached entity. Access to the cache is very fast, details can be found in the tsrm_realpath_r () source. it
recursive function, called by default on each element of the path.

So, the first conclusion from the previous paragraph: the cache is good!

The second is to “jerk” several pages of the site after laying out - a necessary task before opening public access to the site. This will not only reset the OPcode cache, but also update the realpath cache and page cache of the system kernel.

How to clear realpath cache? The function that performs this task is hidden from prying eyes. realpath_cache_clear() ? No, such a function does not exist :(. But, in the best traditions of PHP, there is clearstatcache (true) . The true parameter is very important and is called $clear_realpath_cache . Obviously, it serves the stated goals.

Example


Take a simple example from the ceiling ^

 <?php $f = @file_get_contents('/tmp/bar.php'); echo "hello"; var_dump(realpath_cache_get()); 

This is what he will bring us
 hello array(5) { ["/home/julien.pauli/www/realpath_example.php"]=> array(4) { ["key"]=> float(1.7251638834424E+19) ["is_dir"]=> bool(false) ["realpath"]=> string(43) "/home/julien.pauli/www/realpath_example.php" ["expires"]=> int(1404137986) } ["/home"]=> array(4) { ["key"]=> int(4353355791257440477) ["is_dir"]=> bool(true) ["realpath"]=> string(5) "/home" ["expires"]=> int(1404137986) } ["/home/julien.pauli"]=> array(4) { ["key"]=> int(159282770203332178) ["is_dir"]=> bool(true) ["realpath"]=> string(18) "/home/julien.pauli" ["expires"]=> int(1404137986) } ["/tmp"]=> array(4) { ["key"]=> float(1.6709564980243E+19) ["is_dir"]=> bool(true) ["realpath"]=> string(4) "/tmp" ["expires"]=> int(1404137986) } ["/home/julien.pauli/www"]=> array(4) { ["key"]=> int(5178407966190555102) ["is_dir"]=> bool(true) ["realpath"]=> string(22) "/home/julien.pauli/www" ["expires"]=> int(1404137986) 


What do we see? The full path to the script is allowed in parts, from the very beginning. Since the /tmp/bar.php file /tmp/bar.php not exist, there is no record of it in the cache. However, the path to /tmp allowed, so each following request to the attached files will be slightly faster than the first time.

In the array returned by the realpath_cache_get() function, you can view such important information as the aging time of the record. This value is calculated based on file access time and realpath_cache_ttl settings.

The key field is a hash of the allowed path. The variant of the FNV algorithm is used. This is internal information that is hardly needed in a practical sense. A hash can be either int or float, depending on the size of INT_MAX .

If you call clearstatcache(true) , this array will be reset and PHP will again make a stat() system call on each requested file that has already been cached.

Let's talk about the OPcode cache


Ready for another underwater rock?

The realpath cache is tied to a specific process and is not stored in shared memory.

This means that every time a cache entry becomes outdated, changed, or the cache is cleared manually, this must be done for every running process. It is because of this that users often have difficulty deploying the application on servers using the OPCode cache.

What usually happens during the display of the project? Most often, we simply replace a symbolic link from one version to another, for example, from /www/deploy-a to /www/deploy-b . And here everyone usually forget that the OPcode cache (at least OPCache and APC) rely on the realpath internal cache. Therefore, OPcode caching mechanisms do not see changes in symbolic links and only update the cache as it becomes obsolete. Well, then you already know everything :)

The best solution found to prevent this side effect was to prepare a separate PHP workers pool and switch the balancer to it, allowing the old workers to complete their work normally. This allows you to isolate the two versions from each other, thereby preventing the use of irrelevant cache. All environments, including the realpath cache and the OPCode cache, will be new. This trick is available at least when using Lighttpd and Nginx. And he successfully works in production.

the end


I was asked to write a few lines about the realpath cache. Most likely because of the problems that arise when laying out the code. Well, now you know how it works and how to manage it.

PS from translator:
From ancient php-internals mail lists :
Clear reset the cache? It’s not a problem.

Source: https://habr.com/ru/post/266909/


All Articles