📜 ⬆️ ⬇️

Writing PHP programs using fork ()

PHP Parallel Programs


Previously, the title of the topic was “Writing multi-threaded programs in PHP” . In PHP, there is exactly one “normal” way to write applications that use multiple cores / processors — this is fork () . I will tell you about the use of the fork () system call in PHP and the pcntl extension. As an example, we will write a fairly fast parallel implementation of grep (with a speed similar to find . -type f -print0 | xargs -0 -P $NUM_PROCS grep $EXPR ).

Implementation


Implementing this system call in PHP is very simple:

 PHP_FUNCTION(pcntl_fork) { pid_t id; id = fork(); if (id == -1) { PCNTL_G(last_error) = errno; php_error_docref(NULL TSRMLS_CC, E_WARNING, "Error %d", errno); } RETURN_LONG((long) id); } 

What is the fork () system call


The fork () system call in * nix-systems is a system call that makes a complete copy of the current process. The fork () system call returns its value twice: the parent receives the PID of the descendant, and the descendant receives 0. Oddly enough, in many cases, this is enough to write applications that use multiple CPUs.

 $ php -r '$pid = pcntl_fork(); echo posix_getpid() . ": Fork returned $pid\n";' 9545: Fork returned 9546 9546: Fork returned 0 

')

Pitfalls when using fork ()


In fact, fork () does its work without thinking about what the user process has in memory - it copies everything, for example, functions that are registered via atexit (register_shutdown_function). Example:

 $ php -r 'register_shutdown_function(function() { echo "Exited!\n"; }); pcntl_fork();' Exited! Exited! 

Unfortunately, PHP at the end of the script executes a call to destructors (including internal destructors of database connection resources). Example for the mysqli extension:

 <?php /* test.php */ $conn = new mysqli(..., "mysql") or die("Cannot connect\n"); $pid = pcntl_fork(); if ($pid > 0) { echo "Parent exiting\n"; exit(0); } echo "Sending query\n"; $res = $conn->query("SHOW TABLES") or die("Cannot get query result\n"); print_r($res->fetch_all()); /* $ php test.php Parent exiting Sending query Warning: mysqli::query(): MySQL server has gone away in test.php on line 9 Warning: mysqli::query(): Error reading result set's header in test.php on line 9 Cannot get query result */ 

The output of the program will not necessarily be as written. Sometimes the descendant “has time” before the execution of the procedure for closing the connection in the parent and everything works as it should.

We struggle with deferred execution of functions / destructors


In fact, the problem of deferred execution can be solved if you know exactly what you want. For example, in C, there is a _exit () function that exits without running any installed handlers. Unfortunately, there is no such function in PHP, but its behavior can be partially emulated using signals:

 function _exit() { posix_kill(posix_getpid(), SIGTERM); } 

This “hack” will be enough for us to keep the connection with the database active for two PHP processes at the same time, although it is better, of course, not to do this in practice :):

 <?php /* test.php */ $conn = new mysqli(..., "mysql") or die("Cannot connect\n"); function _exit() { posix_kill(posix_getpid(), SIGTERM); } function show_tables() { global $conn; echo "Sending query\n"; $res = $conn->query("SHOW TABLES") or die("Cannot get query result\n"); echo "Tables count: " . $res->num_rows . "\n"; } $pid = pcntl_fork(); if ($pid > 0) { show_tables(); _exit(); } sleep(1); show_tables(); /* $ php test.php Sending query Tables count: 24 Terminated: 15 <---     $ Sending query Tables count: 24 */ 

Writing grep


Let's now, for example, write a simple version of grep that will search by mask in the current directory.

 <?php /*  : $ php grep.php argv ./grep.php:$pattern = "/$argv[1]/m"; */ exec("find . -type f", $files, $retval); //       $pattern = "/$argv[1]/m"; foreach($files as $file) { $fp = fopen($file, "rb"); //       ,       $is_binary = strpos(fread($fp, 1024), "\0") !== false; fseek($fp, 0); if ($is_binary) { if (preg_match($pattern, file_get_contents($file))) echo "$file: binary matches\n"; } else { while (false !== ($ln = fgets($fp))) if (preg_match($pattern, $ln)) echo "$file:$ln"; } fclose($fp); } 

Writing a parallel version of grep


Now we will think about how we can speed up this program by parallelizing it. You can easily notice that we can divide the $ files array (file list) into several parts and process these parts independently. Moreover, we can do this in all cases when we have a large list of tasks: we simply take each N in the corresponding process and process it. Therefore, we will write a more or less general function for this:

 define('PROCESSES_NUM', 2); //      function parallelForeach($arr, $func) { for ($proc_num = 0; $proc_num < PROCESSES_NUM; $proc_num++) { $pid = pcntl_fork(); if ($pid == 0) break; } if ($pid) { for ($i = 0; $i < PROCESSES_NUM; $i++) pcntl_wait($status); return; } //   PROCESSES_NUM      $l = count($arr); for ($i = $proc_num; $i < $l; $i += PROCESSES_NUM) $func($arr[$i]); exit(0); } 


It remains to replace foreach () with the use of our parallelForeach function and add error handling:
Full source code
 <?php /* parallel-grep.php */ define('PROCESSES_NUM', 2); if ($argc != 2) { fwrite(STDERR, "Usage: $argv[0] <pattern>\n"); exit(1); } grep($argv[1]); function grep($pattern) { exec("find . -type f", $files, $retval); if ($retval) exit($retval); $pattern = "/$pattern/m"; if (false === preg_match($pattern, '123')) { fwrite(STDERR, "Incorrect regular expression\n"); exit(1); } parallelForeach($files, function($f) use ($pattern) { grepFile($pattern, $f); }); exit(0); } function grepFile($pattern, $file) { $fp = fopen($file, "rb"); if (!$fp) { fwrite(STDERR, "Cannot read $file\n"); return; } $binary = strpos(fread($fp, 1024), "\0") !== false; fseek($fp, 0); if ($binary) { if (preg_match($pattern, file_get_contents($file))) echo "$file: binary matches\n"; } else { while (false !== ($ln = fgets($fp))) { if (preg_match($pattern, $ln)) echo "$file:$ln"; } } fclose($fp); } function parallelForeach($arr, $func) { for ($proc_num = 0; $proc_num < PROCESSES_NUM; $proc_num++) { $pid = pcntl_fork(); if ($pid < 0) { fwrite(STDERR, "Cannot fork\n"); exit(1); } if ($pid == 0) break; } if ($pid) { for ($i = 0; $i < PROCESSES_NUM; $i++) { pcntl_wait($status); $exitcode = pcntl_wexitstatus($status); if ($exitcode) exit(1); } return; } $l = count($arr); for ($i = $proc_num; $i < $l; $i += PROCESSES_NUM) $func($arr[$i]); exit(0); } 

Check out the work of our grep on PHP 5.3.10 source code:

 $ php ~/parallel-grep.php '^PHP_FUNCTION' | head ./ext/calendar/calendar.c:PHP_FUNCTION(cal_info) ./ext/calendar/calendar.c:PHP_FUNCTION(cal_days_in_month) ./ext/calendar/calendar.c:PHP_FUNCTION(cal_to_jd) ./ext/calendar/calendar.c:PHP_FUNCTION(cal_from_jd) ./ext/calendar/calendar.c:PHP_FUNCTION(jdtogregorian) ./ext/calendar/calendar.c:PHP_FUNCTION(gregoriantojd) ./ext/calendar/calendar.c:PHP_FUNCTION(jdtojulian) ./ext/calendar/calendar.c:PHP_FUNCTION(juliantojd) ./ext/calendar/calendar.c:PHP_FUNCTION(jdtojewish) ./ext/calendar/calendar.c:PHP_FUNCTION(jewishtojd) $ time php ~/parallel-grep.php '^PHP_FUNCTION' | wc -l 4056 real 0m2.073s user 0m3.265s sys 0m0.550s $ time grep -R '^PHP_FUNCTION' . | wc -l 4056 real 0m3.646s user 0m3.415s sys 0m0.209s $ time find . -type f -print0 | xargs -0 -P 2 grep '^PHP_FUNCTION' | wc -l 4056 real 0m1.895s user 0m3.247s sys 0m0.249s 


Works! I described one of the frequently used patterns for parallel programming in PHP - parallel processing of the queue of tasks. I hope my article will help someone to stop being afraid to write multithreaded PHP applications if the task allows for such a decomposition, as in the example with grep. Thank.

Source: https://habr.com/ru/post/148688/


All Articles