We stabilize PHP in battle - what and why the web server “drops”

You are responsible for the stability of the web project in PHP. The load is constantly growing, features are added, customers are satisfied. One day, mysterious mistakes begin to appear ...

Errors of server software

... which programmers do not know how to fix, because The server software “breaks down”, for example, a bunch of apache-PHP — and the client receives a maintenance work page in response to a request. A web developer often doesn’t have deep knowledge of C programming on unix / linux, and a sysadmin often, unfortunately, doesn’t dive deeper into bash. Real hardcore :-)

Unstable server scripts

Often, certain pages of a web project start to go crazy. For example, running for 15 minutes and finding out what they are doing is not easy. In the last post on this topic, I described one of the methods for determining what a PHP script does on a combat server, but it feels like we need a more powerful tool.

In practice, I often encounter projects that are confronted with a similar class of “server software” errors, and the team does not always know what to do. In the apache log, segmentation violation messages (segmentation fault) often appear, clients get an error page, and a web developer with a sysadmin puzzles, plays with different versions of PHP / apache / precompiler, compiles PHP from source with different options again and again , they write about bugs, and they prove that these bugs are not PHP, but their code, and so on to infinity ...
')
In the article I want to tell you how to quickly and easily find the reason why PHP crumbled on the combat server and eliminate it - without plunging into the wonderful world of system programming in C for unix :-) You will need a wish and one cup of coffee.

We look in the error log of the web server

If you see something similar in the apache error log, then the article is for you:

[Mon Oct 01 12:32:09 2012] [notice] child pid 27120 exit signal Segmentation fault (11)

In this case, it is useless to look for detailed information in the PHP error log - the process itself crashed, not the script. If you don’t make a nice page about maintenance work on nginx in advance, then customers will see an ascetic error "50 *".

I would like to give someone in the face, but to whom? :-) To escape from destructive decisions, recall the theory.

What is “signal”? It can be said a means that the operating system uses to tell the process that, for example, it is not right :-) Takes and, breaking the laws of mathematics, divides into ... 0, or by violent actions causes stack overflow. In this case, we see the signal with the number 11 and the name "SIGSEGV". The list of signals can be viewed by running “kill -l”:

...
11) SIGSEGV
...

~~Some signals, such as SIGSEGV, cannot be intercepted, so your apache-PHP process will be mercilessly killed by the kernel without trial.~~ It turns out that it is possible to intercept it, but you need to go to the source :-)

And for what they killed it?

Now we will find the reason why apache-PHP was killed? To do this, you need to configure the process to create a memory dump at the time of the kill :-) or coredump. Yes, yes - an obsolete 50-year term is still used, meaning the saving of data from magnetic cores . The next time the process is killed by the operating system, the file will be created by the kernel - the place and its name can be configured . If you are in the console, just type "man 5 core".

For example, you can add files to daddy like this:
echo "/tmp/httpd-core.%p"> / proc / sys / kernel / core_pattern

If nothing is set, the system will create a file with the name “core. # Process_number #” in the working directory of the process.

Just make sure the apache-php process has write access to it.

That's not all. By default, most likely, generation of coredump files is disabled on your system. You can enable it by inserting a line at the beginning of the web server startup script:
ulimit - unlimited
or, to make the configuration permanent, edit the "/etc/security/limits.conf" file. There you can insert:

apache - core -1

Details on the file format - "man limits.conf".

However, while I didn’t set up a folder for coredump files for apache, nothing worked ("/etc/httpd/conf/httpd.conf"):

CoreDumpDirectory /tmp

Now restart the Apache:

service httpd restart

We are testing. Manually kill the process:
ps aux | grep httpd
...
kill -11 12345

We look in "/ var / log / httpd / error_log":

[Mon Oct 01 16:12:08 2012] [notice] child pid 22596 exit signal Segmentation fault (11), possible coredump in /tmp

In "/ tmp" we will now find a file with a name like "/tmp/httpd-core.22596"

You have learned how to get a memory dump of the killed process. Now we are waiting for the process to be killed in a natural way.

At the crime scene - interpret coredump

It is important to know that if PHP is compiled without debugging symbols (the --enable-debug key, -g for gcc at compilation) - we will lose a lot of useful information. However, if you compiled PHP from source, even without this option, and the source is nearby - this may be enough for analysis.
There is also a very common misconception that the debug build affects performance and memory footprint used by the process. It does not affect, but only increases the size of the executable file. Therefore, if you can not understand the cause of the error without the debug build, ask the sysadmin to assemble the PHP module with debugging symbols.

How to open coredump? Of course, the old and “very kind” utility is gdb , originally written by the supreme apostle of the ~~free~~ free software movement Richard Stallman .
Understanding how the debugger works will not take long. It is possible for a couple of hours to absorb one of the most entertaining textbooks , or you can ask a sysadmin to do it ;-)

Usually open coredump like this:
gdb path_to_executable_file_webserver path_to_coredump

All self-respecting C developers in unix of course know how to use this debugger, they probably do it every day, but, unfortunately, they may not be in your team. And there is another unpleasant BUT ...

PHP debugging in gdb - black magic

The fact is that the PHP script compiled into bytecode is ... not exactly a C program ;-) It is necessary, though quite a bit,
understand the insides of the Zend engine - and you will understand everything pretty quickly . Namely, you need to find the last call to the execute function in the trace, go to this stack frame and examine the local variables (op_array), as well as look into the global variables of the Zend engine:

 (gdb) frame 3 #3 0x080f1cc4 in execute (op_array=0x816c670) at ./zend_execute.c:1605 (gdb) print (char *)(executor_globals.function_state_ptr->function)->common.function_name $14 = 0x80fa6fa "pg_result_error" (gdb) print (char *)executor_globals.active_op_array->function_name $15 = 0x816cfc4 "result_error" (gdb) print (char *)executor_globals.active_op_array->filename $16 = 0x816afbc "/home/yohgaki/php/DEV/segfault.php"

You can get confused in op_array, so the command to view the type of this structure is useful:

 (gdb) ptype op_array type = struct _zend_op_array { zend_uchar type; char *function_name; zend_class_entry *scope; zend_uint fn_flags; union _zend_function *prototype; zend_uint num_args; zend_uint required_num_args; zend_arg_info *arg_info; zend_bool pass_rest_by_reference; unsigned char return_reference; zend_uint *refcount; zend_op *opcodes; zend_uint last; zend_uint size; zend_compiled_variable *vars; int last_var; int size_var; zend_uint T; zend_brk_cont_element *brk_cont_array; zend_uint last_brk_cont; zend_uint current_brk_cont; zend_try_catch_element *try_catch_array; int last_try_catch; HashTable *static_variables; zend_op *start_op; int backpatch_count; zend_bool done_pass_two; zend_bool uses_this; char *filename; zend_uint line_start; zend_uint line_end; char *doc_comment; zend_uint doc_comment_len; void *reserved[4]; } *

The debugging process consists of walking between stack frames (“frame N”), switching to each call of the execute function and examining its local arguments (“print name”, “ptype name”). The smaller the frame number, the deeper you are. Sometimes it is useful to visit PHP extension and see where the error occurred and why (at least try to understand the reason).

 (gdb) frame ## (gdb) print op_array.function_name $1 = 0x2aaab7ca0c10 "myFunction" (gdb) print op_array.filename $2 = 0x2aaab7ca0c20 "/var/www/file.php"

And so on…

If you choked coffee :-), then just remember that going through the frames of the call stack using the "frame # N #" command, you can see only certain elements of this structure - and you can determine which PHP file the PHP function was called, what function did it call , etc. - and get to the cause of "Segmentation Fault" or another error that killed the process. And explain to programmers what the reason is and correct it! Quickly and, you have to be optimistic - forever.

Common causes of errors

Start browsing coredump files (or assign it to a sysadmin) and you will quickly learn how to classify errors into groups:
1) Problems in PHP extensions. In this case, either disable the extension, or try playing its settings. You know for sure that the problem is in it, the matter is small.
2) The problem with recursion, stack. You may step on an error in which a library function, for example, pcre, enters recursion and calls itself twenty thousand times. You can either tweak the library settings or, if lazy, add a larger stack process ("/etc/init.d/httpd"):

ulimit -s "set the value to more"

And the current value can be viewed with the command: “ulimit -a” (man ulimit, then look for “ulimit”).
3) Problems in the PHP core - here you need to write to PHP developers :-)

In general, the range of causes of the error will be seriously reduced. What we need.

Debugging a running process

That's not all. If you can not get coredump - you can connect to the running process and walk on it. While you are inside the process, its execution is suspended (“ps aux | grep apache | grep 'T'” - it will be in the state of tracing). When you leave it, it will continue to run again. You can connect as follows:
gdb -p process_id

Results

In the article, we learned how to “correctly prepare” server software errors, do apache-PHP debug builds, create coredump files and interpret them correctly using a symbolic debugger. We also learned that from a coredump file you can find a specific PHP file and the function that caused the error.

Now you can create a checklist for the manager to deal with mysterious server errors that neither the web developers nor the sysadmins can figure out:

Enable collecting coredump files on the server (sysadmin)
If necessary, rebuild apache-php with debugging symbols (sysadmin)
Using gdb (the weekend to study it), investigate the cause of the error (a system administrator with a web developer)
Take measures to eliminate it or reduce the frequency of appearance: change settings, update software, write to bugtracker, disable PHP extension, etc.

In conclusion, I invite everyone to our cloud service Bitrix24 , in which we effectively use all the technologies described in the article.

Good luck and stable work of web projects!

Source: https://habr.com/ru/post/153001/

All Articles