And let's take a look at PHP today from a different point of view, and write an extension to it. Since on this topic there have already been publications on Habré (
here and
here ), we will not delve into the reasons for why this may be useful and for what can be used in practice. This article will explain how to build simple extensions under Windows using Visual C ++ and under Debian using GCC. I will also try to light up a little work with PHP arrays inside extensions and compare the performance of the algorithm written in native PHP and using code written in C.
Compile under Win32
So let's start with Windows.
As you know , PHP developers use Visual C ++ 9 or Visual Studio 2008 to compile their creation under Windows. Therefore, we will use Visual Studio 2008, the free
Express version is also suitable, as indeed, probably, later and earlier versions of the studio.
What we need:
First, create a Win32 Console Application type project and select a DLL in the Application type. Now we have to configure all dependencies and paths for the linker:
Find the stdafx.h file in the project and replace its contents with the following:
#ifndef STDAFX #define STDAFX #define PHP_COMPILER_ID "VC9"
If you try to compile the project at this stage, you will get an error saying that main \ config.w32.h is missing. You can get it either by running the main \ configure.bat script, or you can pull it out of the sources, for example, PHP 5.2. In this case, do not forget to edit all the paths in this file and uncomment the "#define HAVE_SOCKLEN_T" directive. Now the project should compile without errors.
Now let's write hello world, add the following to our cpp file:
PHP_FUNCTION(test); const zend_function_entry test_functions[] = { PHP_FE(test, NULL) {NULL, NULL, NULL} }; zend_module_entry test_module_entry = { STANDARD_MODULE_HEADER,
Now we will connect this module in PHP and try to run something like this:
php -r "test();"
To which we should get the answer “hello habr”.
Compilation under * nix
In * nix, everything turned out to be easier as always. I will show with the example of Debian, I think that under other systems the process will not be different.
We will need:
- Have PHP installed on the machine,
- Have a php-dev installed. To do this, just execute one command:
apt-get install php5-dev
Let's create somewhere a directory for our extension. Well, for example / test. There we will create two empty files:
config.m4
test.c
The first is needed for the magic compilation of the extension, and the second is its source code. In config.m4 we will write the following:
PHP_ARG_ENABLE(test, Enable test support) if test "$PHP_TEST" = "yes"; then AC_DEFINE(HAVE_TEST, 1, [You have test extension]) PHP_NEW_EXTENSION(test, test.c, $ext_shared) fi
Inside test.c add
#include "php.h"
And after this deadline, copy the contents of the cpp-file from the Windows version.
Now we go to the console and:
That's all. Now you can open php.ini, add your extension there:
extension = test.so
And check its performance team
php -r "test();"
Argument handling and return values
First, let's look at how you can take arguments:
char* text; int text_length; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &text, &text_lenght) == FAILURE) { return; }
The third parameter specifies the expected type (
here you can see all the options), in this case it is char * or int. Also under the link you can find options for combining types and specifying the number of arguments. All of the following parameters are variables in which the passed values will be written. When passing a string, the string itself and its length are transmitted.
If the number of arguments passed to your function does not match, E_WARNING will be thrown, and you can return some value, for example, an error message.
You can return both simple types and complex ones. Let's get acquainted with the formation of the returned array. To indicate that the array will be returned, it must be initialized:
array_init(result);
To add values to an array, you must use functions depending on which index and value is added to the array. For example:
add_next_index_long(result, 42);
A full list of features can be found
here.
If you are interested in someone, I can consider an example of working with objects in the next article (a classic example of extending objects is mysqli).
There is a very good article on this topic.
Performance
To check the performance, I chose a somewhat synthetic example: counting the occurrence of each character in a string. In other words, we need to get a function that takes a string as a parameter, and returns an array in which the number of uses of each character in a given string is indicated. This example will demonstrate working with large strings.
I got this implementation, do not kick much for the code, I still write more in PHP than in C:
PHP_FUNCTION(calculate_chars) { char* text; int text_length; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &text, &text_length) == FAILURE) { return; } array_init(return_array); int table[256] = { 0 }; for (int i = 0; i < text_length; i++) { table[((unsigned char*)text)[i]]++; } char str[2]; str[1] = '\0'; for (int i = 0; i < 256; i++) { if (table[i]) { str[0] = (char)i; add_assoc_long(return_array, str, table[i]); } } }
This code produces the following result:
user> php -r "print_r( calculate_chars('example') );" Array ( [a] => 1 [e] => 2 [l] => 1 [m] => 1 [p] => 1 [x] => 1 }
And now let's compare the speed of execution of this code and the same for native PHP:
$map = array(); for ($i = 0; $i < $length; $i++) { $char = $text[$i]; if (isset($map[$char])) { $map[$char]++; } else { $map[$char] = 1; } }
I will compare the execution time of both solutions using the
microtime function. Take a line of 100 characters, a line of 5000 characters, and a line of 69000 characters (I took the book A Message from the Sea, written by Charles Dickens, I hope he will forgive me for this), and for each option we will chase both solutions several thousand times . The results are shown in the table below. Testing was conducted on my not very strong home laptop and VDS with Debian on board, and yes, I clearly understand that the results may depend on the configuration, operating system version, PHP, atmospheric pressure and wind direction, but I wanted to show only approximate numbers .
The full code of the test script can be downloaded
here . The sources and binaries of the extensions themselves can be downloaded
here (win) and
here (nix) .
| Number of iterations | PHP code / Win32 | PHP code / Debian | PHP extension / Win32 | PHP extension / Debian | Win32 win | Debian win |
1. Line of 100 characters | 1,000,000 | 84.7566 sec | 72.5617 sec | 8.4750 sec | 4.4175 sec | 10 times | 16.43 times |
2. 5000 character string | 10,000 | 39.1012 sec | 31.7541 sec | 0.5001 sec | 0.134 sec | 78.19 times | 236.98 times |
3. Line of 69000 characters | 1000 | 52.3378 sec | 44.0647 sec | 0.4875 sec | 0.0763 sec | 107.36 times | 577.51 times |
findings
Judging the performance of the module compared to the interpreted code, we see that tangible results can be obtained on large amounts of data and on small quantities of iterations. That is, for frequently used, but not very resource-intensive algorithms, it does not make sense to put them into compiled code. But for algorithms that work with large amounts of data, this may be practical. Also, based on my measurements, you can see that the results of the PHP code are comparable on different systems (I remind you that these were two different machines), but the results of the extension work are very different. From this I personally conclude that there are some features of the compilation that I do not know. However, I strongly doubt that someone is using a Windows server for PHP projects. Although I also very much doubt that someone will run right now to rewrite something in C, this article is still more just for fun than a guide to action. I just wanted to show that writing a PHP extension is very simple, and can sometimes be very useful.
UPD1. Comparison with count_chars
In the comments asked an interesting question: what if to compare with the performance of the count_chars function?
I increased the number of iterations a hundred times, and drove the same test, but using this function. You can see that on Debian, the results were almost equal, and under Windows there is an interesting situation: the larger the data, the more my module merges in performance. Let me remind you that the idea of the test was not to write a bicycle, but to take an algorithm for working with large amounts of data.
| Number of iterations | count_chars / win32 | count_chars / debian | extension / Win32 | extension / Debian | Win32 win | Debian win |
1. Line of 100 characters | 10,000,000 | 67.5245 sec | 47.8104 sec | 81.8185 sec | 43.8091 sec | 0.83 times | 1.09 times |
2. 5000 character string | 1,000,000 | 22.4693 sec | 12.8959 sec | 47.2514 sec | 12.9577 sec | 0.48 times | 0.99 times |
3. Line of 69000 characters | 100,000 | 15.0681 sec | 7.661 sec | 46.9598 sec | 7.7387 sec | 0.32 times | 0.99 times |
Materials
- Hacker's Guide to the Zend Engine, php.net
- Compiling shared PECL extensions with phpize, php.net
- Creating a PHP Extension for Windows using Microsoft Visual C ++ 2008, talkphp.com
- Extension Writing Part I: Introduction to PHP and Zend, devzone.zend.com
- Extension Writing Part II: Parameters, Arrays, and ZVALs, devzone.zend.com
- Wrapping C ++ Classes in a PHP Extension, devzone.zend.com