📜 ⬆️ ⬇️

Problems using NtQuerySystemInformation with undocumented arguments

The morning that day began with the fact that we "broke if'y." This expression was once invented by one of my colleagues, who demonstrated how his debugger went to the if block when stepping through a code, despite the fact that the condition that the if was checking was exactly false. The problem turned out to be trivial at that time - he used a release-optimized build, and in such a scenario, of course, you cannot trust step-by-step debugging. But the expression “broken if'y” itself stuck and has been used in us since then to denote a situation when something so fundamental that it did not work even ceased to work.

So, that day we had the function NtQuerySystemInformation broken - one of the most important functions of Windows OS, which returns information about processes, threads, system descriptors, etc. I once wrote this article about the benefits of using this function. But it turned out that sometimes even such cornerstones of the system can fail.

So what happened.

For quite a long time (for several years already) we used the NtQuerySystemInformation function call with the SystemHandleInformation argument to get information about all the descriptors in the system. Yes, this argument formally refers to the undocumented, but if you start looking for information on how to list all descriptors in all currently running applications on Windows, the combination NtQuerySystemInformation + SystemHandleInformation will be the most frequently proposed option. And it really works, on all operating systems starting from Windows NT.
')
Why might you need to look for descriptors in all processes? Well, for various reasons. Utilities such as Process Hacker simply show them for informational purposes. There are programs that do it for the sake of finding a resource that is currently blocked by someone (for example, a file). You can also, for example, find a mutex in someone else's process, used to allow only one copy of the program to run, close it and allow two instances of such an application to run. Or list descriptors for the sake of duplicating them in order to organize a sandbox. In general, there are many tasks.

I will not completely cite the descriptor enumeration code here, I will only say that it was, in general, similar to the common examples, like this :

while ((status = NtQuerySystemInformation( SystemHandleInformation, handleInfo, handleInfoSize, NULL )) == STATUS_INFO_LENGTH_MISMATCH) handleInfo = (PSYSTEM_HANDLE_INFORMATION)realloc(handleInfo, handleInfoSize *= 2); // NtQuerySystemInformation stopped giving us STATUS_INFO_LENGTH_MISMATCH. if (!NT_SUCCESS(status)) { printf("NtQuerySystemInformation failed!\n"); return 1; } for (i = 0; i < handleInfo->HandleCount; i++) { ... } 

But here I launch our application - and suddenly it turns out that the descriptor I need (and I know for sure that it exists!) Is missing from the list returned by the NtQuerySystemInformation () function. All arrived - "if'y broken."

We are trying to reproduce the problem on other computers in the office. On some reproduced, on most - no. We are trying to understand how those on which are reproduced are different from those on which everything is good. The Windows version is the same everywhere, the updates, the build of our program are all identical. Suddenly, someone notices that all the laptops on which the problem has been reproduced are of the same model. Hardware incompatibility? But why suddenly now, before it worked ... In addition, there are other laptops of the same model in the office that are still working now. Even the versions of the device drivers were compared - everything seemed to be the same. But on some laptops everything works, but not on others.

Hair pulling on my head lasted about half a day, until I accidentally paid attention to two things:

  1. For some reason, the PIDs of the processes that are usually three, four or five-digit numbers on my computer have become six-digit. It was rather strange to see PID type 780936. I did not notice these before. At the same time, the total number of running processes was quite adequate (up to a hundred).
  2. The task manager on the CPU tab showed the total number of descriptors in the system - and it was huge, more than 800,000.

For a normal application, it is normal to open a hundred or two descriptors. Well, a thousand. Chrome with active use can open about 2000, Visual Studio on large projects can open 3000. But who opened 800,000? Fortunately, the previously mentioned Process Hacker allows you to show the number of descriptors for each process and even sort the list of processes by the number of descriptors used.

And what do we see? And we see something like this:



I have to say that I just did the above screenshot, so the first process in the list has “only” about 20,000 descriptors. And then, when I saw the problem for the first time, there were about 650,000 of them there. And who is our hero? Bingo! This is the process of SynTPEnhService.exe.

And here in my head the whole puzzle develops. SynTPEnhService.exe is part of the Synaptics touchpad driver. It was installed only on laptops of a certain model in our office, on which the problem occurred. A short observation showed that every 5 seconds this process starts a child process, SynTPEnh.exe, which closes after 1-2 seconds. At the same time, the parent process continues to hold the descriptor of the child process, which leads to the leakage of descriptors. One every 5 seconds. These are 17 280 descriptors per day. Leave the computer turned on for a week and now you have more than a hundred thousand hung handles. My personal computer did not reboot for more than a month - hence the PIDs of new processes with numbers above half a million. This also explains why the problem was reproduced on some laptops in our office, but did not occur on others the same: some of my colleagues reloaded their PCs every day, and someone, like me, left them on for the night .

By the way, at this point I remembered that I had already read about some problem with the Synaptics touchpad drivers. Having rummaged a little, I found this article written by Bruce Dawson (many translations of his articles were published at different times on Habré, but not this particular one). There he describes the problem of memory leakage due to this endless restart of the SynTPEnh.exe process, but says nothing about the problem of handle leaks, so my find is still different from it.

Solution to the problem


So, the touchpad driver "eats" hundreds of thousands of handles - and so what? And the fact that the NtQuerySystemInformation (SystemHandleInformation, ...) function written back in the days of Windows NT had (and has) some quite limited internal buffer. I have not found an exact indication of its size anywhere, but obviously it was not designed for a million descriptors. As a result, the function returns them “as much as they can,” which means that among them it may or may not be the desired one.

What to do? As Rick from the animated series “Rick and Morty” said: “When you invent teleportation, you immediately discover an unpleasant thing: you are the last in the Universe who invented it”. As it turned out, Microsoft realized this problem with the limited buffer in NtQuerySystemInformation when calling it with the SystemHandleInformation argument already 20 years ago and therefore, starting with WindowsXP, they added another (and also undocumented) argument to the SystemExtendedHandleInformation function from WindowsXP. When you call NtQuerySystemInformation (SystemExtendedHandleInformation, ...), all descriptors in the system will be returned, no matter how many. Well, or rather, I do not know for sure, maybe there are some restrictions for this argument, but the fact that he can return 800,000 descriptors in the state is for sure.

On the net, you can find examples of using SystemExtendedHandleInformation, for example, this one . In general, everything is the same there, other structures are simply used, and that is all.

It was a cautionary tale about using Widnows' undocumented arguments, which can be very useful, but it requires careful testing and readiness for non-standard problems.

Source: https://habr.com/ru/post/433906/


All Articles