Strace

I thought about the title for a minute, so I didn’t think of anything ...
Strace. Probably not the person who would not hear about strace. If someone has not heard, then strace is a utility that tracks system calls, which is a translation mechanism that provides the interface between the process and the operating system (kernel). These calls can be intercepted and read. This allows you to better understand what the process is trying to do at a given time. By intercepting these challenges, we can achieve a better understanding of the behavior of processes, especially if something goes wrong. The functionality of the operating system to track system calls is called ptrace. Strace calls ptrace and reads the behavior of the process, returning a report. Details can be found in the wiki or the official man . Actually, well, it's certainly about Linux. In other OS, its counterparts.
So, for me personally, strace is like a last resort. When all the logs have already been viewed, all the debug and verbose keys are turned on, and the cause of the problems has not been found, I ~~pull strace out of wide pants~~ . There is one thing, strace is a tool that is not at all a silver bullet, which will immediately show and tell everything. To work with strace requires the presence of certain knowledge and the wider and deeper this knowledge, the greater the likelihood of finding a problem.
The Internet is full of introductory articles that describe examples of running strace. In this article I will show private problems and their solution using strace.

The first case. The KVM virtual machine does not start with the vhost = on setting for network interfaces enabled. If you omit this parameter, the virtual machine starts successfully.
So for starters, I cut off all unnecessary parameters and left only the required options and parameters for creating a network interface. This is necessary not to increase the already volumetric output of strace:
# qemu-system-x86_64 -m 4096 -netdev tap, ifname = tap90, id = tap90, vhost = on -device virtio-net-pci, netdev = tap90, mac = 08: 77: D1: 00: 90: 90
qemu-system-x86_64: -netdev tap, ifname = tap90, id = tap90, vhost = on:
qemu-system-x86_64: -netdev tap, ifname = tap90, id = tap90, vhost = on: Device 'tap' could not be initialized

So, we run a virtual machine with strace and analyze the output. I intend to omit most of the output and show only the part by which it is possible to determine the cause of the problem.

# strace -f qemu-system-x86_64 -m 4096 -netdev tap,ifname=tap90,id=tap90,vhost=on -device virtio-net-pci,netdev=tap90,mac=08:77:D1:00:90:90 ... open("/dev/vhost-net", O_RDWR) = -1 ENOMEM (Cannot allocate memory) write(2, "qemu-system-x86_64:", 19qemu-system-x86_64:) = 19 write(2, " -netdev", 8 -netdev) = 8 write(2, " tap,ifname=tap90,id=tap90"..., 65 tap,ifname=tap90,id=tap90,vhost=on) = 65 write(2, ": ", 2: ) = 2 write(2, "vhost-net requested but could no"..., 48vhost-net requested but could not be initialized) = 48 write(2, "\n", 1 ) = 1 write(2, "qemu-system-x86_64:", 19qemu-system-x86_64:) = 19 write(2, " -netdev", 8 -netdev) = 8 write(2, " tap,ifname=tap90,id=tap90"..., 65 tap,ifname=tap90,id=tap90,vhost=on) = 65 write(2, ": ", 2: ) = 2 write(2, "Device 'tap' could not be initia"..., 37Device 'tap' could not be initialized) = 37 write(2, "\n", 1 ) = 1

As you can see, the error occurs in the open () call when you try to open the device / dev / vhost-net . After that, error messages are printed that were observed with a simple start of the virtual machine.
As a workaround, we had to reset page cache and buffers through running echo 3> / proc / sys / vm / drop_caches, after which the virtual machine started up successfully. Then some of the machines were transferred to other hosts, but this is another story.

The second case. And again, the virtual machine and looking ahead, I will say that here again is a memory issue. By experience, it became clear that the virtual machine does not start with a large value of allocated memory. In this case, -m 10240 . If you specify smaller values, the machine starts. In this case, the free memory on the server is more than 10GB. And again we uncover strace. In the example below, I omit the entire list of virtual machine startup options and some of the unnecessary strace output.

 # strace -f qemu-system-x86_64 -m 10240 ... open(process_vm_readv: Bad address 0x7eff1ccf40b8, O_RDONLY) = 25 fstat(25, process_vm_readv: Bad address {...}) = 0 mmap(NULL, 26244, PROT_READ, MAP_SHARED, 25, 0) = 0x7eff1e74c000 close(25) = 0 futex(0x7eff1cf31900, FUTEX_WAKE_PRIVATE, 2147483647) = 0 mprotect(0x7eff0c021000, 24576, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7efc72bfd000 munmap(0x7efc72bfd000, 20983808) = 0 munmap(0x7efc78000000, 46125056) = 0 mprotect(0x7efc74000000, 163840, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory) munmap(0x7efc74000000, 67108864) = 0 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mprotect(0x7eff0c021000, 24576, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7efc72bfd000 munmap(0x7efc72bfd000, 20983808) = 0 munmap(0x7efc78000000, 46125056) = 0 mprotect(0x7efc74000000, 163840, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory) munmap(0x7efc74000000, 67108864) = 0 write(2, process_vm_readv: Bad address 0x7eff0c00e000, 77 (process:10928): GLib-ERROR **: gmem.c:230: failed to allocate 131072 bytes) = 77 --- SIGTRAP {si_signo=SIGTRAP, si_code=SI_KERNEL, si_value= int=487903808, ptr=0x7eff1d14d240}} --- <... syscall_16 resumed> ) = ? <unavailable> +++ killed by SIGTRAP +++ Trace/breakpoint trap

And again already known Cannot allocate memory, but already in mprotect () . We are looking for and find what limits us sysctl key vm.overcommit_memory with a value of 2. Change to 0 and the machine starts successfully.
')
The third case. There is some process that works for some time, then suddenly begins to consume CPU resources. Load average rises from 0.5 to 1.5 (in the system, and this is a virtual machine with only 2 virtual processors). Latency increases, service responsiveness drops. When restarting the process, after some time the situation repeats. The time elapsed since the start and before the failure is always different. So, we connect strace to the process at the moment when the process starts to behave incorrectly. And here we see a lot of messages like:

 ppoll([{fd=10, events=POLLIN|POLLPRI}], 1, {0, 90000000}, NULL, 8) = 1 ([{fd=10, revents=POLLIN}], left {0, 90000000}) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 epoll_wait(10, {{EPOLLIN, {u32=57661216, u64=57661216}}}, 4096, 0) = 1 rt_sigprocmask(SIG_SETMASK, ~[SEGV VTALRM RTMIN RT_1], NULL, 8) = 0 accept(15, 0x7ffff8f46af0, [16]) = -1 EMFILE (Too many open files)

The accept () system call fails with the error "EMFILE (Too many open files)". We look at the process limits in / proc / pid / limits and check the number of open files with lsof .

 # grep files /proc/12345/limits Max open files 1024 1024 files # lsof -nPp 12345 |wc -l 1086

The limit of open files is 1024, and the current number of files opened by the process, slightly exceeds the limit value. Everything is clear here, first with the help of prlimit we increase the limit value of the running process, and then we fix the limits on completely through editing /etc/security/limits.conf .

The fourth case . There should be a small digression. Here we look at the pgBouncer - PostgreSQL binding . The application works with the database through pgBouncer, i.e. connects to it, pgBouncer issues a connection to the database from the pool, the application does its work, disconnects, and the connection returns to the pool and remains until issued to the next client.
So the problem is, in the application log and in the postgres log, messages began to appear that transactions in the database cannot update / delete / insert data, because transaction is in READ ONLY mode.
During the search, several hypotheses were put forward, one of them later proved to be correct. The client connecting to the puller sets the session to READ ONLY mode. After disconnecting the client, this setting sticks to the server connection and lives with it until pgBouncer forcibly completes this connection. If you specify server_reset_query = DISCARD ALL in the pgBouncer settings, then when disconnected, all session-based settings will be reset. Thus, someone put up a READ ONLY session, it was saved and then this connection got to other clients. In general, DISCARD ALL was a temporary solution while looking for the culprit. The search was long and the suspicions lay on the application, but the developers assured that they checked all the places in the code and did not reveal any problem areas ...
We get strace, connect to pgBouncer and wait.
Catch by keyword READ ONLY. And the output is saved in a separate log. Since after the discovery will have to delve into it.
So it took some time and here it is:

 recvfrom(10, "Q\0\0\0:SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;\0", 2048, 0, NULL, NULL) = 59 sendto(11, "Q\0\0\0:SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;\0", 59, 0, NULL, 0) = 59 sendto(10, "C\0\0\0\10SET\0Z\0\0\0\5T", 15, 0, NULL, 0) = 15

We are waiting for some more time (we catch other requests of the same client), turn off strace and open our log.
Here the most important descriptor numbers, in particular, the descriptor number 10 (note the recvfrom (10 ). You need to find who opened these descriptors that preceded the captured message, some grep and here it is. Accept () system call ( accept (...) = 10 ) opens the same descriptor number 10.

 accept(8, {sa_family=AF_INET, sin_port=htons(58952), sin_addr=inet_addr("192.168.10.1")}, [16]) = 10 getpeername(10, {sa_family=AF_INET, sin_port=htons(58952), sin_addr=inet_addr("192.168.10.1")}, [16]) = 0 getsockname(10, {sa_family=AF_INET, sin_port=htons(6432), sin_addr=inet_addr("192.168.10.10")}, [16]) = 0

In a conversation with the developers, it turned out that the address 192.168.10.1 belongs to the VPN server through which the developers were connected. After conducting educational conversations and the adoption of organizational measures, the problem no longer manifested itself.

That's it. Thank you all for your attention.

Source: https://habr.com/ru/post/215577/

All Articles

Strace

More articles: