It all started with "Little White". About what it is and why it was necessary described here:
http://habrahabr.ru/company/oktell/blog/108726/ . That is, we had four working call-center servers, equally configured and with approximately the same configuration of external and internal lines, users and setting up the internal database. For each of the servers, the operator received a uniform call load, but the servers reacted differently!
N Server configuration and operating system
1 T1300 @ 1.66 1 GB of RAM, Windows 2003 Standard Ed. R2 SP1 32 bit
2 Intel Core Duo E8400 @ 3000 4 GB of RAM, Windows 2003 Standard Ed.SP1 32 bit
3 Intel Pentium 4 3GHz 2 GB of RAM, Windows 2003 Standard Ed. SP2 32 bit
4 E3400 @ 2.60 2 GB of RAM, Windows 2003 Standard Ed. R2 SP2 32 bit
Faced complaints about the quality of communication. Moreover, they complained not about every call, but about “some”. They complained about the "quicks", which are very characteristic of VoIP telephony.
Quite quickly it was found that the cause of the appearance of the "Quacks" was an unpredictable increase in processor utilization in one (first) of the call-center servers as the load increased. And all this despite the fact that other servers did not notice such a load at all, and there was no increase in the load on the processor with the same number of calls. Even in spite of the fact that the first server was significantly weaker than all the others, such a picture - the growth of processor utilization to 100% - should not have been observed.
')
You probably shouldn’t say that we have gone through the standard path of “wiping headlights”, “kicking wheels”, etc. In the end, we came to the conclusion that neither the settings of the call-center itself, nor its DBMS, influence the server’s behavior. The starting point for understanding the essence of the problem was the fact that in the task manager in the process list, none of the processes occupied processor time, while at the monitoring of speed, the CPU load history showed a continuous load of the core processor at 20%.


The goal was to get an answer to the question of what the core is busy in when there is no load on all other services. Process Explorer - a regular utility from Microsoft - suggested that the main consumer of resources is “Hardware Interrupts”. For further analysis of the reasons for such consumption, another regular Microsoft utility, “Kernrate View”, was downloaded. As described in the recommendations for use, the command line was “C: \ Program Files \ KrView \ Kernrates \ Kernrate_i386_XP.exe >> log.txt” and, after a while, pressing Ctrl-C stopped. Received the log.txt file containing information of the form:
/ ============================== \
\ ============================== /
Date: 2010/12/08 Time: 1:09:14
Machine Name: RESERVCC
Number of Processors: 1
PROCESSOR_ARCHITECTURE: x86
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 0e08
Physical Memory: 1015 MB
Pagefile Total: 2450 MB
Virtual Total: 2047 MB
PageFile1: \ ?? \ C: \ pagefile.sys, 1524MB
OS Version: 5.2 Build 3790 Service-Pack: 1.0
WinDir: C: \ WINDOWS
Kernrate User-Specified Command Line:
Kernrate_i386_XP.exe
Kernel Profile (PID = 0): Source = Time,
Using Kernrate Default Rate of 25000 events / hit
------------ Overall Summary: --------------
P0 K 0: 00: 36.671 (28.5%) U 0: 00: 09.671 (7.5%) I 0: 01: 22.343 (64.0%) DPC 0: 00: 29.484 (22.9%) Interrupt 0: 00: 00.281 (0.2% )
Interrupts = 136809, Interrupt Rate = 1063 / sec.
Total Profile Time = 128687 msec
BytesStart BytesStop BytesDiff.
Available Physical Memory, 443772928, 407339008, -36433920
Available Pagefile (s), 2104172544, 2093707264, -10465280
Available Virtual, 2132660224, 2131611648, -1048576
Available Extended Virtual, 0, 0, 0
Total Avg. Rate
Context Switches, 609407, 4736 / sec.
System Calls, 5078088, 39461 / sec.
Page Faults, 119817, 931 / sec.
I / O Read Operations, 11671, 91 / sec.
I / O Write Operations, 209479, 1628 / sec.
I / O Other Operations, 229216, 1781 / sec.
I / O Read Bytes, 39981700, 3426 / I / O
I / O Write Bytes, 19240135, 92 / I / O
I / O Other Bytes, 7130204725, 31107 / I / O
- Results for Kernel Mode:
- OutputResults: KernelModuleCount = 99
Hips for the Kernel
Time 44651 hits, 25000 events per hit - Module Hits msec% Total Events / Sec
intelppm 27457 128685 61% 5334149
hal 12284 128685 27% 2386447
ntkrnlpa 2868 128685 6% 557174
win32k 525 128685 1% 101993
alder9xp 427 128685 0% 82954
tcpip 254 128685 0% 49345
NTFS 251 128685 0% 48762
afd 120 128685 0% 23312
RDPDD 109 128685 0% 21175
e1e5132 109 128685 0% 21175
iaStor 97 128685 0% 18844
NDIS 39 128685 0% 7576
RDPWD 31 128685 0% 6022
fltMgr 26 128685 0% 5051
amon 12 128685 0% 2331
termdd 10 128685 0% 1942
CLASSPNP 10 128685 0% 1942
ftdisk 7 128685 0% 1359
ipsec 3 128685 0% 582
Npfs 2 128685 0% 388
USBPORT 2 128685 0% 388
volsnap 2 128685 0% 388
TDTCP 1 128685 0% 194
rdbss 1 128685 0% 194
ws2ifsl 1 128685 0% 194
netbt 1 128685 0% 194
watchdog 1 128685 0% 194
PartMgr 1 128685 0% 194
================================= END OF RUN ============== ====================
Next, we determine which driver loads the “Hardware Interrupts” process. In the list of the Kernrate View log, it will be the top one, and its percentage of kernel occupancy will be shown next in percent. Here it is worth noting that the percentages do not show the total percentage of the system load, but the percentage of the load on the server core drivers.
Determined that this driver is Intelppm (Intel processor power manager). Further - google to help us. The Internet is great, powerful and limitless. Pretty quickly realized that the problem with Intelppm arises, not so often, however, we were not alone confronted with such a disaster. The result was not slow to detect itself; an article was found that not only describes the problem itself, but also indicates the way to solve it (The permanent address of the original article is here:
http://www.osp.ru/text/print/302/5818429.html )
Further, following the recommendations of Stephen Dougherty, we understand that intelppm is a processor power management driver that is not needed on a server where the battery power is not used at all. Several solutions have been proposed: reinstalling, updating or stopping a faulty driver. What options you choose - see for yourself, it is quite logical here to follow the original recommendations of Dougherty.
We got into the registry. Intel processor driver data is located in the HKEY_LOCAL_MACHINE / SYSTEM / Current Control Set / Services / intelppm registry key. To disable intelverm draver, we changed the value of the Start parameter from 1 to 4. Microsoft experts, of course, recommend making a backup of the registry, but we are with you Russian people, and that there is one parameter to change from 1 to 4.

The restart helped make sure that the processor load is on average between 60-70% even with a full (!) Server call-center call load.
The device manager looks very interesting with this:

She complains, sneers, cries: “The driver for this device has been disabled. Perhaps, the necessary functions are performed by another driver. (Code 32), Click the "Diagnostics" button to start the diagnostics wizard for this device. "But this does not affect flight safety ;-)