When in a dream you are dreaming “oh and if the server is not enough ...”
For a start, good night. I am writing something useful for the first time (except for various semi-tests in my blog). Man, I am inquisitive to the horror, it suddenly occurred to me that I can help save someone a lot of time;).
In general, when PHP creates quite large projects (> 100,000 lines of code), the desire to do “rightly” what was done long ago threatens to plunge everything into chaos. At least for new programmers who can come to the company in a week, a month, a year ... The solution is a clear systematization from the very beginning and the establishment of strict architectural rules. For myself, I decided - without using frameworks, I will write only “Hello World” sites. Without further ado, when I thought about frameworks, I leafed through, read, but decided to surrender to Zendu with his
ZendFramework . Mighty he, although I made a huge amount of changes in him.
')
In this decision, along with all the possible advantages and convenience, the question-wall suddenly arises: now my business logic takes, probably, somewhere at all, 1-2% of the execution time of the entire program. The fee for convenience and OOP (or “convenience of OOP”? Probably even just “convenience” or just “OOP” is almost the same thing;)) - a huge amount of accompanying and control code.
In general, when I was doing a new project - there was a goal - at least 50 requests per second on the seedy Celeron 2.6GHz. Those. about 0.02 seconds per request, including mysql and so on and so forth. During the creation of the project, I managed to overclock it several times with some improvements. What kind Pour a cup of coffee - and welcome to the world of wise development :) I have to say right away - it worked out.
Optimization from A to Z. MockSoul soup recipe :)
Stage 0. Preparing
Environment? My favorite scheme:
- LigHTTPd. Under Linux. With sys-epoll enabled;
- PHP5. Via FastCGI. PHP should be compiled with CGI support, sharedmem (or threads, better sharedmem - and both will not compile at once;)). Wild example with what I collect pkhp:
./configure' '--prefix=/usr/lib/php5' '--host=i686-pc-linux-gnu' '--mandir=/usr/lib/php5/man' '--infodir=/usr/lib/php5/info' '--sysconfdir=/etc' '--cache-file=./config.cache' '--disable-cli' '--enable-cgi' '--enable-fastcgi' '--disable-discard-path' '--disable-force-cgi-redirect' '--with-config-file-path=/etc/php/cgi-php5' '--with-config-file-scan-dir=/etc/php/cgi-php5/ext-active' '--without-pear' '--disable-bcmath' '--with-bz2' '--disable-calendar' '--disable-ctype' '--without-curl' '--without-curlwrappers' '--disable-dbase' '--disable-exif' '--without-fbsql' '--without-fdftk' '--disable-filter' '--disable-ftp' '--with-gettext' '--without-gmp' '--disable-hash' '--disable-ipv6' '--disable-json' '--without-kerberos' '--enable-mbstring' '--with-mcrypt' '--without-mhash' '--without-msql' '--without-mssql' '--with-ncurses' '--with-openssl' '--with-openssl-dir=/usr' '--disable-pcntl' '--without-pgsql' '--without-pspell' '--without-recode' '--disable-simplexml' '--enable-shmop' '--with-snmp' '--disable-soap' '--enable-sockets' '--without-sybase' '--without-sybase-ct' '--disable-sysvmsg' '--disable-sysvsem' '--disable-sysvshm' '--with-tidy' '--disable-tokenizer' '--disable-wddx' '--disable-xmlreader' '--disable-xmlwriter' '--without-xmlrpc' '--without-xsl' '--disable-zip' '--with-zlib' '--disable-debug' '--enable-dba' '--without-cdb' '--without-db4' '--without-flatfile' '--with-gdbm' '--without-inifile' '--without-qdbm' '--with-freetype-dir=/usr' '--with-t1lib=/usr' '--disable-gd-jis-conv' '--with-jpeg-dir=/usr' '--with-png-dir=/usr' '--without-xpm-dir' '--with-gd' '--with-ldap' '--without-ldap-sasl' '--with-mysql=/usr' '--with-mysql-sock=/var/run/mysqld/mysqld.sock' '--without-mysqli' '--without-pdo-dblib' '--with-pdo-mysql=/usr' '--without-pdo-odbc' '--without-pdo-pgsql' '--without-pdo-sqlite' '--with-readline' '--without-libedit' '--with-mm' '--without-sqlite'
We competently fasten to lighttpd, and not anyhow:
fastcgi.server = (
".php" => (
"localhost" => (
<b> "socket" => "/tmp/php5-gmru-sandbox-mocksoul-lighttpd.sock" [# 1] </ b>,
<b> "bin-path" => "/ usr / lib / php5 / bin / php-cgi -c" + "/ path / to / application / config / php_config_dir" [# 2] </ b>,
<b> "min-procs" => 1 [# 3] </ b>,
<b> "max-procs" => 1 [# 3] </ b>,
"bin-environment" => (
<b> "PHP_FCGI_CHILDREN" => "32" [# 4] </ b>,
<b> "PHP_FCGI_MAX_REQUESTS" => "3200" [# 5] </ b>
)
)
)
)
([# 1], [# 2], ... - so I will refer to the comments on the code. If you want to take the code, you will have to erase such notes. I will follow the same scheme in the code)
- [# 1] - unix sockets are much faster than tcp sockets. So use them only if TCP doesn’t have a serious need (or, haha, under Windows :))
- [# 2] - here I just showed an example of how you can fasten the config php to a different host (via -c we point to the folder with php.ini)
- [# 3] - min-procs and max-procs MUST BE = 1 !!! Why? Because then I will say about bytecode caching. Kesh will be illogical if the number of processes Pkhp more than 1
- [# 4] is a magical dance. We ask php to run 32 threads in one process to process requests from lighttpd. Important: if you put, for example, 10 and all 10 will be occupied by some wild 10-second-running script - lighttpd will give 500 error! Those. the number of threads does not increase in real-time - put 32, 64 or even 128 (it works like threadpool)
- [# 5] - please kill the stream and create a new number of requests. Just in case, php is not perfect :).
- Opcode Cacher. Or kesher baytkod. Or "what kind of dibilism - to parse the same files with each request ?!". Very (VERY!) Recommend APC (Alternative PHP Cache) which lies in PECL . You can also eAccelerator or even ZendOptimizer. Tastes are different ... But when choosing between eAccelerator and APC - I recommend APC. Why? Yes, at least for the opportunity to put anything in the shmem segment :). Below I will tell.
Stage 1. We write
First we write. We write and twist in my head thoughts about how to do something more reasonable and fast right away. So as not to be distracted (in general, this is probably a completely natural desire of any self-respecting programmer%))
Moments that immediately need to pay attention:
- You probably won't need to use require and include. Basically - require_once and include_once.
- For iterating over arrays, changing and filtering them, we learn how to use the array_ * functions in php. Especially lambda functions:
<? php
$ arr = array ('that', 'is', 'this');
array_walk ($ arr, create_function ('& $ v, $ k', '$ v = $ v. "yeah";');
print_r ($ arr);
// outputs:
// Array
// (
// [0] => that yeah
// [1] => is yeah
// [2] => this yeah
//)
// Would you make it a loop? Ah ah ah...
?>
- Passing a variable by reference (for example, $ a = 1; call_func (& $ a)) - does not affect performance. Passing arrays by reference - affects a little bit. Passing classes is very influential. I mean this - do not pass anything on the link hoping to speed up the program. Pass on the links only when you need it
- Make classes static if possible. Those. if for the work of the class a closed instance is not needed in general.
- You can comment as much as you want - the bytecode chester ignores comments anyway. It affects the performance ... hmm ... by 0.000001% :)
- Avoid deep recursions. The standard task - to take a list of files including subdirectories can be done without recursion at all =)
- Read literate docks. The documentation of the same ZendFramework - there are a lot of things useful even for those who do not use the framework and are not going to use
- Try to divide the code into logical blocks. So that you can take 10-20 lines in a row and say - here I am doing ONLY THIS. Take another 10-20 - and say, and here I am doing ONLY ANOTHER. The number of lines that must be taken, of course, depends on you. But it is better that the blocks be no more than 30-40 lines. Break the program and any blog into initialization, configuration, operation, saving the result (into a variable, say). What does the speed? In six months you will understand;).
- About that “Can make me $ a =„ some $ v inline “or $ a =„ some “. $ v. “Var” is not even worth thinking about. Personally, I (IMHO) find absolutely dibilistic insertion of variables directly into strings. Best readability:
- $ var = 'some'. $ in. 'li'. $ ne. 'variable';
- $ var = sprintf ('some% sli% s variable', $ in, $ li);
- Use constants to never change. They paryshatsya at the very beginning and lie in general in another piece of memory than ordinary variables. Constructs of the form $ str = 'some'. STR_CONSTANT also looks better. Especially competently - line break. Called it differently, I love NL (NewLine) or CRLF (CarretReturnLineFeed)
- Do not forget that foreach may not make a copy of the array :)
foreach ($ arr as $ key => <b> & $ val </ b>) {...}
- Paradoxically, but such a moment in Pkhp completely kills me: is_null () is invented by an idiot. if (null === $ var) or if ($ var === null) faster than if (is_null ($ var)) ... dibilism. Do not use is_null () :)
- Regular expressions, working with strings using str_ * functions, etc. are left on your conscience as going beyond this already bloated article :)
Stage 2. We reflect on possible waste of time.
So ... you wrote something. And now let's see what usually takes enough dofig time without your business logic:
- Connect to DB
- Handling tons of require_once and include_once
- DB queries themselves
- Somewhere we store the config and parsim it every time? Use database models and initialize them every time? In general, see how much the same we do every request !!
- Do something with the file system? What for? Personally, I think that almost any project can be written with no IO at all (of course, except that it will use a database and etc.). No need to store anything in the file system. Small. Large (some indexed gig file) - need
This I sorted everything by importance. And now in order for each insatiable moment:
Connect to DB
It's simple - if you own a server - use persistent connections! PDO_MYSQL, MYSQL - everyone can do it)
Handling tons of require_once and include_once
Here the fun begins =). For starters, I took a look at how many files I have included with ANY request in ZendFramework. It turned out - a little less than 300 (!!!!). If you do not use bytecode kesher - it will be generally abnormally long procedure.
The solution “vlob” was found by itself - to push it all into one file. The question arose - how to find out what is always included in us - and what sometimes? In general, there was no particular time to think at that moment, and therefore this aspect I decided to “vlob”)
Wild result -
http://www.mocksoul.ru/pub/dev/mkzend.phpsThere:
- How often a file is accessed - we look through the APC cache according to statistics
- Draw a sign
- Change the zend machine :). Type cut all the require_once, comments, opening and closing php tags, extra spaces ... mocking shorter :) See the source
- Save the resulting giant script in the file ...)
The script is absolutely unstable and sharpened for one project. It is necessary to run through the browser to APC worked. Just as an example. You will not work with 100% probability =).
As it turned out, 300 files were parsed for 2 seconds, pulled out of the byteskeeper in 0.3 seconds, and the generated superfile is large in parse 0.7 seconds and pulled out of the cache in 0.003 seconds. The project immediately accelerated by almost 3 times :). Maniac optimization, however. The method is suitable for a production server, since Developing libraries that are loaded from another file is impossible.
Queries to the database
Take a look at the DBA and finally start using MYSQL_QUERY_CACHE. In my.cnf, we write
query_cache_size = 100M
. We follow the cache by
show status like 'qcache%'
. We also read the MySQL docks very closely regarding Query Cache
Stop doing the same thing - cache!
Read the config? Raspars? Got a ready-made array? So why parse it again? ) You also have - shared memory at hand in the form of APC! :) Incredibly fast speed of work ... Store in it everything that you can - configuration, collected objects, results of a “describe table” a la (this is the prerogative of Zend_Db_Table_ *). From the cache data is taken at an unimaginable speed - 0.000001s somewhere. In memory, if you do not duplicate anything, you can save just dofig data. Remember that 1 gig is a huge pile of possible information. Do not use IO in the file system for this - better memory. Depending on your qualification - from 10 to 100% increase in speed. See below about APD;)
Why do you need FS?
Use the FS as a keeper of anything, only if it does not fit into the memory. Even if you write a log or statistics of requests - put it in APC! And save, say, every 5 minutes on a screw.
Stage 3. Tired of thinking about wasting time. We want a schedule before your eyes!
It turned out to be a very valuable discovery for me. In general, step by step guide:
- We need PECL APD (Advanced PHP Debugger)
- Configure dumpdir for apd in the config. Sort of:
zend_extension = / usr / lib / php5 / lib / php / extensions / no-debug-non-zts-20060613 / apd.so
apd.dumpdir = "/ tmp / php-apd-dump"
- In the most important file we write in the top of the top
apd_set_pprof_trace();
, thereby including the profiler dump - We make 1-100 requests to the server. Each time a new file will be saved in our / tmp / php-apd-dump
- Now we can watch the profiler results either directly in the console - along with apd comes the pprofp scripter
- And we can also make a super thing - convert to a more unified format :). With APD, in addition to pprofp, there is also pprof2calltree. It converts profiler dumps into a format understood by cachegrind and KCacheGrind in particular. The resulting file is opened in kcachegrind - and an applause with pleasure.
In general, the usual such profiler is obtained. That's just for PHP, I have not done this before;)
Stage 4. Check
ab2
is foolish to check the speed with simple requests for 1 URL using
ab
or
ab2
.
A more logical option is to make a list of all (or not all;)) URLs, put in a text file, take
Siege and test. During the test, monitor TPS (TransactionsPerSecond) on screws (for example, using iostat from the sysstat package), monitor processor utilization, and make sure that at the end there are no server responses other than 2xx.
Why is it all
So much to try to accelerate everything you need when the project grows. Increase in speed by 10% on 1 server gives an increase in speed equal to 10%. And if you already have 10 servers, then a 10% increase in speed will be equal to adding another 11th server. Those. + 100% in terms of 1 server. It's a lot. This is money. And this is a higher entry threshold for competitors;).
Eeee
2 days ago broke the collarbone. And he wrote all this with one hand. Monument to me !!! :))
Kind Regards, Vadim Burmakin aka MockSoul © 2007