📜 ⬆️ ⬇️

Troubleshooting using WinDbg, Sos and Sosex


Image: Julien Dumont , Flickr


Unfortunately, sometimes there are situations when the system stops working or starts to consume resources uncontrollably, and logs and system metrics cannot help. The situation is further aggravated by the fact that there is no Visual Studio or any debugger on the system in production, it is impossible to remotely debug. More often than not even the ability to access this machine. In this case, the only solution is to analyze the memory dump of the process. I want to describe a few common scenarios of finding problems on such dumps. This is a search for deadlocks, memory leaks and high consumption of processor resources.


The peculiarity of such problems is that they are extremely rarely reproduced on developers' computers, are difficultly covered by automated testing, and they require large resources. Therefore, an important step in the development of the system is to conduct a series of stress tests on iron, close to the one in which it will work for the customer. In the event of such problems, immediately analyze and correct.


A process memory dump can be taken in various ways. For example, in the Task Manager, you can select the process of interest and call the Create Dump File item in the context menu. ProcDump utility from Mark Russinovich helps me a lot . With its help, you can shoot different types of dumps for different events. For example, you can take a complete memory dump of a process at a time when the process consumes more than 30% of the CPU within 10 seconds. In general, a very useful utility.


You can analyze dumps using various tools, for example, in Visual Studio or using the ClrMD library. I will use WinDbg and Sos. No special advantages, it's just convenient for me to work with them. Also an indispensable assistant for me was the plugin Sosex , which greatly simplifies life in the search for problems and analysis.


Before we begin our research, I’ll note that for convenience in WinDbg you should set up debugging symbols. This process has already been described many times on the Internet, so I will not repeat, but just give a link .


Let us consider the real scenarios. Almost all of the scenarios described arose in actual practice, and I or my colleagues had to deal with them in due time. But, in order not to violate trade secrets, I prepared several demonstration examples that illustrate the problems that have arisen. Examples can be found here . For plausibility, we will analyze not a workflow, but its dump. To do this, remove the full memory dump of the process with the command procdump.exe -ma PID . So, let's begin.


Search for deadlocks (example 1st, simple)


(Example 01-MonitorDeadlock )


When looking for deadlocks, Sosex can help very well. Suppose we noticed that our application does not respond to any commands. It hung. What to do?
Run WinDbg (importantly, the same bit depth as the application) and load the captured dump (press Ctrl + D and select the dump file in the dialog). We load the necessary extensions


.loadby sos clr .load D:\utils\sosex.dll (   ) 

Sosex has a great dlk command that can look for deadlocks between sync blocks and / or ReaderWriterLock objects. Run it and look at the result.


 0:011> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock(Slim) instances... Scanning for holders of ReaderWriterLock locks... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... Scanning for threads waiting on SyncBlocks... Scanning for threads waiting on ReaderWriterLock locks... Scanning for threads waiting on ReaderWriterLocksSlim locks... Scanning for threads waiting on CriticalSections... *DEADLOCK DETECTED* CLR thread 0x4 holds the lock on SyncBlock 000000e2a8343ae8 OBJ:000000e2aa064348[System.Object] ...and is waiting for the lock on SyncBlock 000000e2a8343a98 OBJ:000000e2aa064330[System.Object] CLR thread 0x3 holds the lock on SyncBlock 000000e2a8343a98 OBJ:000000e2aa064330[System.Object] ...and is waiting for the lock on SyncBlock 000000e2a8343ae8 OBJ:000000e2aa064348[System.Object] CLR Thread 0x4 is waiting at MonitorDeadlock.Program.Thread2()(+0x2e IL,+0x8f Native) [D:\Projects\DebugExamples\MonitorDeadlock\Program.cs @ 45,5] CLR Thread 0x3 is waiting at MonitorDeadlock.Program.Thread1()(+0x2e IL,+0x8f Native) [D:\Projects\DebugExamples\MonitorDeadlock\Program.cs @ 33,5] 1 deadlock detected. 

We were lucky! Sosex immediately showed us that we have a deadlock, on what objects it arose and in which flows. Using the threads command, we get a complete list of managed threads and match the managed stream ID with the internal WinDbg ID.


 0:011> !threads Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 db8 000000e2a82929e0 202a020 Preemptive 000000E2AA06C0A0:000000E2AA06DFD0 000000e2a82860e0 0 MTA 5 2 5d0 000000e2a830f9d0 2b220 Preemptive 0000000000000000:0000000000000000 000000e2a82860e0 0 MTA (Finalizer) 9 3 27bc 000000e2a835c210 202b220 Preemptive 000000E2AA06A0D8:000000E2AA06BFD0 000000e2a82860e0 1 MTA 10 4 27ac 000000e2a835e5e0 202b220 Preemptive 000000E2AA06EB50:000000E2AA06FFD0 000000e2a82860e0 1 MTA 

We see that threads 3 and 4 of interest to us have captured locks (Lock Count column c 1). Now, look at the stacks of these threads to see where the lock occurred. To do this, we call the ~ N *! Clrstack command, where N is the internal thread identifier in the debugger (9 and 10). Part of the output is hidden for brevity.


 0:011> ~9e!clrstack OS Thread Id: 0x27bc (9) Child SP IP Call Site 000000e2c2d4ee58 00007ffeb0de3dda [GCFrame: 000000e2c2d4ee58] 000000e2c2d4efc8 00007ffeb0de3dda [GCFrame: 000000e2c2d4efc8] 000000e2c2d4f008 00007ffeb0de3dda [HelperMethodFrame_1OBJ: 000000e2c2d4f008] System.Threading.Monitor.Enter(System.Object) 000000e2c2d4f100 00007ffe2a46088f MonitorDeadlock.Program.Thread1() [D:\Projects\DebugExamples\MonitorDeadlock\Program.cs @ 33] 000000e2c2d4f160 00007ffe88b1af17 System.Threading.Tasks.Task.Execute() 0:011> ~10e!clrstack OS Thread Id: 0x27ac (10) Child SP IP Call Site 000000e2c2f4eaf8 00007ffeb0de3dda [GCFrame: 000000e2c2f4eaf8] 000000e2c2f4ec68 00007ffeb0de3dda [GCFrame: 000000e2c2f4ec68] 000000e2c2f4eca8 00007ffeb0de3dda [HelperMethodFrame_1OBJ: 000000e2c2f4eca8] System.Threading.Monitor.Enter(System.Object) 000000e2c2f4eda0 00007ffe2a4609df MonitorDeadlock.Program.Thread2() [D:\Projects\DebugExamples\MonitorDeadlock\Program.cs @ 45] 000000e2c2f4ee00 00007ffe88b1af17 System.Threading.Tasks.Task.Execute() 

Perfectly! Now we see the methods that caused the deadlock. We have a classic example of locking on two locks.


 private static void Thread1() { lock (Lock1) lock (Lock2) } private static void Thread2() { lock (Lock2) lock (Lock1) } 

Search for deadlocks (example 2, not so simple)


(Example 02-RwlDeadlock )


But sometimes there are situations where the dlk command does not work.


 0:016> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock(Slim) instances... Scanning for holders of ReaderWriterLock locks... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... No deadlocks detected. 

What to do in this case? We still suspect that there is a deadlock. First, let's look at what synchronization blocks the process has and by whom they are captured.


 0:016> !SyncBlk -all Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 1 00000082dde314a8 0 0 0000000000000000 none 00000082dfc6ac88 Microsoft.Win32.UnsafeNativeMethods+ManifestEtw+EtwEnableCallback 2 00000082dde314f8 0 0 0000000000000000 none 00000082dfc6b340 System.__ComObject 3 00000082dde31548 0 0 0000000000000000 none 00000082dfc6b360 System.EventHandler`1[[Windows.Foundation.Diagnostics.TracingStatusChangedEventArgs, mscorlib]] 4 0000000000000000 0 0 0000000000000000 none 0 Free 5 00000082dde315e8 0 0 0000000000000000 none 00000082dfc6bf40 Microsoft.Win32.UnsafeNativeMethods+ManifestEtw+EtwEnableCallback 

No results. No synchronization block is captured in the system. The lock may be on the ReaderWriterLock object. You can verify this with the Sosex rwlock command.


 0:016> !rwlock ReaderWriterLock instances... Address ReaderCount WaitingReaderCount WriterThread WaitingWriterCount ----------------------------------------------------------------------------------------- 000000d380002b38 0 0 -- 0 000000d380008fc8 0 0 -- 3 0:016> !rwlock 000000d380008fc8 WriterThread: 0x0 WriterLevel: 0 WaitingWriterCount: 3 WriterEvent: 3d8 WaitingWriterThreadIds: 0x1,0x3,0x6 ReaderCount: 0 CurrentReaderThreadIds: None WaitingReaderCount: 0 ReaderEvent: 384 WaitingReaderThreadIds: None 

Fine! We see that there is a ReaderWriteerLock object, which has 3 threads waiting to be locked for writing. Let's look at the stacks of these threads (don't forget to map the CLR identifiers to the internal debugger identifiers using the threads command).


 0:016> ~0e!clrstack OS Thread Id: 0x1ea0 (0) Child SP IP Call Site 000000d3f69ae388 00007ffeb0de3dda [GCFrame: 000000d3f69ae388] 000000d3f69ae3c8 00007ffeb0de3dda [HelperMethodFrame_1OBJ: 000000d3f69ae3c8] System.Threading.ReaderWriterLock.FCallUpgradeToWriterLock(System.Threading.LockCookie ByRef, Int32) 000000d3f69ae4d0 00007ffe89376fcd System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32) 000000d3f69ae510 00007ffe2a440c4f RwlDeadlock.UpgradableRwlDeadlock.GetExpensiveObject(System.String) [D:\Projects\DebugExamples\RwlDeadlock\UpgradableRwlDeadlock.cs @ 34] 000000d3f69ae5a0 00007ffe2a440b6c RwlDeadlock.UpgradableRwlDeadlock.b__2_0(Int32) [D:\Projects\DebugExamples\RwlDeadlock\UpgradableRwlDeadlock.cs @ 15] … 0:016> ~9e!clrstack OS Thread Id: 0x310c (9) Child SP IP Call Site 000000d3f969e8d8 00007ffeb0de3dda [GCFrame: 000000d3f969e8d8] 000000d3f969e918 00007ffeb0de3dda [HelperMethodFrame_1OBJ: 000000d3f969e918] System.Threading.ReaderWriterLock.FCallUpgradeToWriterLock(System.Threading.LockCookie ByRef, Int32) 000000d3f969ea20 00007ffe89376fcd System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32) 000000d3f969ea60 00007ffe2a440c4f RwlDeadlock.UpgradableRwlDeadlock.GetExpensiveObject(System.String) [D:\Projects\DebugExamples\RwlDeadlock\UpgradableRwlDeadlock.cs @ 34] 000000d3f969eaf0 00007ffe2a440b6c RwlDeadlock.UpgradableRwlDeadlock.b__2_0(Int32) [D:\Projects\DebugExamples\RwlDeadlock\UpgradableRwlDeadlock.cs @ 15] 

We have found an example of incorrect use of the UpaderToWriterLock method of the ReaderWriterLock class.


From these examples, you can try to describe an algorithm with which you can analyze deadlocks:


  1. We are looking for deadlocks using the dlk command.
  2. If deadlocks are not found, analyze the synchronization objects using the syncblk and rwlock commands.
  3. Analyze the stacks of threads that are pending. We are looking for calling blocking methods on the stack. Such methods can be Monitor.Enter, Task.WaitAll, ReaderWriterLock.UpgradeToWriterLock, etc.
  4. You can also analyze the unmanaged thread by calling the k command and looking for a call to the ntdll function! NtWaitForMultipleObjects or ntdll! NtWaitForSingleObject.

This algorithm helps me in solving probably 80% of problems. The remaining 20% ​​require more non-trivial actions.


Memory leak search


(Example 03-MemoryLeak )


Sometimes a situation may arise when an application begins to allocate more and more memory without releasing it. In this case, it is necessary to analyze the allocated memory for leaks. In .Net applications, leaks can occur when the garbage collector cannot release the roots of this object. Sources of roots can be:



A memory leak can be seen on the process memory allocation graph. It can be obtained using performance counters, looking at the Process \ Private bytes, .Net CLT Memory # Bytes in All Heaps counters, etc. But it’s more convenient for me to monitor the Private Bytes schedule in the Process Explorer program. If on the graph we see that more and more memory is constantly being allocated, which is not being released, and that the size of the second generation is growing. This makes it an unequivocal hint that we have a memory leak. We remove the memory dump of the process and analyze it.


Load the captured dump into WinDbg and connect sos and sosex as described above. Sosex has a great opportunity to build a heap index for later use (the bhi command). This index helps to significantly speed up the execution of heap analysis commands. For example, on a 5 GB dump, the GcRoot command took me about 20 minutes, and its counterpart from sosex mroot on the constructed index was completed in a couple of seconds.


First, let's look at the general heap memory allocation statistics.


 0:004> !HeapStat Heap Gen0 Gen1 Gen2 LOH Heap0 1204664 6409008 1421466144 166272 Free space: Percentage Heap0 24 3204400 153752240 184SOH: 10% LOH: 0% 

As expected, the second generation has grown to incredible size. Let's look at the statistics of selected objects in the heap using the sosex dumpgen command. The latest version of sosex allows you to specify all as the desired generation and thus analyze the entire heap.


 0:004> !dumpgen 2 -stat Count Total Size Type -------------------------------------------------       9 648 System.Int32[] 4 760 System.Char[] 2 1,072 System.Globalization.CultureData 18 1,216 System.String[] 58 3,248 System.RuntimeType 180 7,486 System.String 15,825 379,800 MemoryLeak.Worker 15,824 1,012,736 System.EventHandler`1[[System.EventArgs, mscorlib]] 15,719 153,720,802 **** FREE **** 15,824 1,266,299,776 System.Int64[] 63,545 objects, 1,421,434,137 bytes Total GEN 2 size: 1,421,466,144 bytes 

It can be seen that more than 15 thousand EventHandler objects are allocated in the heap.

 0:004> !dumpgen 2 -type System.EventHandler`1[[System.EventArgs, mscorlib]] Object MT Size Name -------------------------------------------------------------------       000000b304b7de38 00007FFC2C1FA880 64 System.EventHandler`1[[System.EventArgs, mscorlib]] 000000b304ba5018 00007FFC2C1FA880 64 System.EventHandler`1[[System.EventArgs, mscorlib]] 000000b304bcc1f8 00007FFC2C1FA880 64 System.EventHandler`1[[System.EventArgs, mscorlib]] 

Let's look at the fields of one of them (the last). The sos.do or sosex.mdt commands will help.


 0:004> !mdt 000000b304bcc1f8 000000b304bcc1f8 (System.EventHandler`1[[System.EventArgs, mscorlib]]) _target:000000b304bb8948 (MemoryLeak.Worker) _methodBase:NULL (System.Object) _methodPtr:00007ffbd0a500c0 (System.IntPtr) _methodPtrAux:0000000000000000 (System.IntPtr) _invocationList:NULL (System.Object) _invocationCount:0000000000000000 (System.IntPtr) 

The event handler points to the Worker class object. Select several specific EventHandler objects and take a look at their roots.


 0:004> !mroot 000000b304bcc1f8 AppDomain 000000b31b3fd720: GCHandle(Pinned) @ 000000b31b5317d8 000000b32cfa5970[System.Object[]] 000000b31cfa35e8[MemoryLeak.GlobalNotifier] 000000b3052faaa0[System.EventHandler`1[[System.EventArgs, mscorlib]]] 000000b32cfa9968[System.Object[]] 000000b304bcc1f8[System.EventHandler`1[[System.EventArgs, mscorlib]]] 0:004> !mroot 000000b304ba5018 AppDomain 000000b31b3fd720: GCHandle(Pinned) @ 000000b31b5317d8 000000b32cfa5970[System.Object[]] 000000b31cfa35e8[MemoryLeak.GlobalNotifier] 000000b3052faaa0[System.EventHandler`1[[System.EventArgs, mscorlib]]] 000000b32cfa9968[System.Object[]] 000000b304ba5018[System.EventHandler`1[[System.EventArgs, mscorlib]]] 

It can be seen that both objects caught on the same instance of the GlobalNotifier class.


 0:004> !gch 000000b31b5317d8 HandleObj HandleType Object Size Data Type -------------------------------------------------------------------------------------- 000000b31b5317d8 Pinned 000000b32cfa5970 16344 System.Object[] -------------------------------------------------------------------------------------- 1 Handle 0:004> !mdt 000000b31cfa35e8 -r 000000b31cfa35e8 (MemoryLeak.GlobalNotifier) SomethingHappened:000000b3052faaa0 (System.EventHandler`1[[System.EventArgs, mscorlib]]) _target:000000b3052faaa0 (System.EventHandler`1[[System.EventArgs, mscorlib]]) <RECURSIVE> _methodBase:NULL (System.Object) _methodPtr:00007ffbd093e5e0 (System.IntPtr) _methodPtrAux:00007ffc2c2d84a8 (System.IntPtr) _invocationList:000000b32cfa9968 (System.Object[], Elements: 16384) _invocationCount:0000000000003dff (System.IntPtr) 

We can say that this is a static object to which events are subscribed to by Worker objects. Analyze threads and their stacks.


 0:004> ~0e!clrstack OS Thread Id: 0x9700 (0) Child SP IP Call Site 000000b31b2ee7a8 00007ffc2fe8f833 [HelperMethodFrame: 000000b31b2ee7a8] 000000b31b2ee910 00007ffbd0a50614 MemoryLeak.Program.Main(System.String[]) [D:\Projects\DebugExamples\MemoryLeak\Program.cs @ 19] 000000b31b2eebc0 00007ffc30094073 [GCFrame: 000000b31b2eebc0]. 

What happens in the Main method? Disassemble it. Sosex has a muf command that displays an aggregated listing of the IL code and the generated assembler code. Sometimes, when there is no access to the sources, it can be very convenient. (The output is limited)


 0:004> !muf 00007ffbd0a50614 MemoryLeak.Program.Main(string[]): void obj:MemoryLeak.Worker Notifier.SomethingHappened += obj.SomethingHappened; IL_003d: ldsfld MemoryLeak.Program::Notifier IL_0042: ldloc.0 (obj) IL_0043: ldftn MemoryLeak.Worker::SomethingHappened(object, System.EventArgs) IL_0049: newobj System.EventHandler<System.EventArgs>::.ctor IL_004e:callvirt MemoryLeak.GlobalNotifier::add_SomethingHappened(System.EventHandler<System.EventArgs>) 00007ffb`d0a50659 488bcb mov rcx,rbx 00007ffb`d0a5065c 488bd7 mov rdx,rdi 00007ffb`d0a5065f 3909 cmp dword ptr [rcx],ecx 00007ffb`d0a50661 e892faffff call 00007ffb`d0a500f8 obj.Work(); Thread.Sleep(50); IL_0063: ldc.i4.s 50(0x32) IL_0065: call System.Threading.Thread::Sleep IL_006a: br.s IL_0037 while (true) 

We have a classic example of a memory leak on C # events. In an infinite loop, new Worker objects are created that subscribe to the GlobalNotifier.SomethingHappened event. Unsubscribe does not occur anywhere, and therefore the garbage collector cannot free allocated memory.


Let's try to create an approximate algorithm for analyzing memory leaks:


  1. Using the HeapStat command, we get the heap memory statistics. Analyzing the size of generations.
  2. The dumpheap –stat or dumpgen all -stat commands analyze the statistics of selected objects. We are looking for a large number of application-specific objects.
  3. Using the dumpgen – type TYPE command, we get a list of addresses of the objects of interest.
  4. Team mroot ObjAddress get the roots for which these objects are hooked.
  5. We analyze the roots. For completeness, analyze the application thread stacks. It is important to understand that the accumulation of unallocated memory can take a long time, and threads can be terminated and / or reused from the pool. In this case, the analysis of the stacks will not give results. The main tool is root analysis.

High CPU consumption analysis


(Example 04-CpuUtilization . Run the example and wait 10-15 minutes.)


Lastly, it remains to consider the case when our system begins to eagerly consume processor resources. Usually, a high consumption of processor resources may indicate that our application could get into an infinite loop and the only way to reduce it is to restart the application.


Remove the process dump and analyze it. Using the .time command, let's see how much time our application spent in user mode, and how much in kernel mode.


 0:008> .time Debug session time: Mon Apr 3 17:51:18.733 2017 (UTC + 3:00) System Uptime: 11 days 10:29:22.677 Process Uptime: 0 days 0:08:52.766 Kernel time: 0 days 0:00:13.562 User time: 0 days 0:14:06.968 

In my example, the program spent 13 seconds in kernel mode, and 14 minutes in user mode. This means that the threads constantly do some work and almost never fall asleep. This is quite strange. In WinDbg, there is a runaway command that can show the distribution of runtime by threads.


 0:008> !runaway User Mode Time Thread Time 4:6598 0 days 0:02:57.687 3:47b4 0 days 0:02:52.421 5:3370 0 days 0:02:49.390 6:6b1c 0 days 0:02:45.921 7:bb58 0 days 0:02:19.687 2:b478 0 days 0:00:17.062 0:8e10 0 days 0:00:00.015 8:6474 0 days 0:00:00.000 1:97f0 0 days 0:00:00.000 

With the default settings, runaway displays only time in user mode. We can see that 4 streams are threshing something for about 3 minutes each. Shows the system thread IDs. Using the threads command, we will get the internal debugger identifiers and at the same time we will see if there are any exceptions.


 0:008> !threads Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 0 1 8e10 000000eab2ad3690 202a020 Preemptive 0000000000000000:0000000000000000 000000eab2aaa950 0 MTA 2 2 b478 000000eab2afb6f0 2b220 Preemptive 0000000000000000:0000000000000000 000000eab2aaa950 0 MTA (Finalizer) 3 3 47b4 000000eab2b491a0 202b220 Preemptive 0000000000000000:0000000000000000 000000eab2aaa950 0 MTA System.Collections.Generic.KeyNotFoundException 000000eb4ff8f848 4 4 6598 000000eab2b4a750 2b220 Preemptive 000000EBD8F91930:000000EBD8F91948 000000eab2aaa950 1024 MTA System.Collections.Generic.KeyNotFoundException 000000eb4ff8fd60 5 5 3370 000000eab2b53a20 202b220 Preemptive 0000000000000000:0000000000000000 000000eab2aaa950 0 MTA System.Collections.Generic.KeyNotFoundException 000000eb4ff5d698 6 6 6b1c 000000eab2b5a3d0 202b220 Preemptive 0000000000000000:0000000000000000 000000eab2aaa950 0 MTA System.Collections.Generic.KeyNotFoundException 000000eb4ff8fba8 7 7 bb58 000000eab2b66680 21220 Cooperative 0000000000000000:0000000000000000 000000eab2aaa950 0 Ukn 

It can be seen that KeyNotFoundException exceptions occurred in all threads of interest to us.


 0:008> ~3e!pe Exception object: 000000eb4ff8f848 Exception type: System.Collections.Generic.KeyNotFoundException Message:     . InnerException: <none> StackTrace (generated): SP IP Function 000000EACD15EF40 00007FFC2D46AF6F mscorlib_ni!System.Collections.Concurrent.ConcurrentDictionary`2[[System.Guid, mscorlib],[System.__Canon, mscorlib]].get_Item(System.Guid)+0x48720f 000000EACD15EF80 00007FFBD0A4074A CpuUtilization!CpuUtilization.Program.ResolveAsset(System.Guid)+0x3a StackTraceString: <none> HResult: 80131577 

Let's look at the stacks of threads (the output for one thread is given, the stacks of the others are similar).


 0:008> ~3e!clrstack OS Thread Id: 0x47b4 (3) Child SP IP Call Site 000000eacd15c968 00007ffc42d00c6a [GCFrame: 000000eacd15c968] 000000eacd15caa8 00007ffc42d00c6a [GCFrame: 000000eacd15caa8] 000000eacd15cae8 00007ffc42d00c6a [HelperMethodFrame_1OBJ: 000000eacd15cae8] System.Threading.Monitor.Enter(System.Object) 000000eacd15cbe0 00007ffc2cfe3e9d System.Collections.Concurrent.ConcurrentDictionary`2[[System.Guid, mscorlib],[System.__Canon, mscorlib]].TryAddInternal(System.Guid, System.__Canon, Boolean, Boolean, System.__Canon ByRef) 000000eacd15ccd0 00007ffc2cfe5235 System.Collections.Concurrent.ConcurrentDictionary`2[[System.Guid, mscorlib],[System.__Canon, mscorlib]].TryAdd(System.Guid, System.__Canon) 000000eacd15cd10 00007ffbd0a40811 CpuUtilization.Program.Init(System.Guid) [D:\Projects\DebugExamples\CpuUtilization\Program.cs @ 64] 000000eacd15cd40 00007ffbd0a407aa CpuUtilization.Program.ResolveAsset(System.Guid) [D:\Projects\DebugExamples\CpuUtilization\Program.cs @ 55] ... 

Now we know the place in the source where to look for the problem. If you look at the source code of the example, you can see an infinite loop, in which there is always an error in receiving data from ConcurrentDictionary.


The general search algorithm is quite simple:


  1. Using the runaway command, we find the most “voracious” streams.
  2. Analyze the stacks of these threads.

In real life, everything is much more complicated: tens or hundreds of threads are spinning in the application, and millions of objects are allocated in the heap. This greatly complicates the analysis, but the described steps have repeatedly helped me in analyzing problems with systems. I hope that the examples I have given using WinDbg, Sos and Sosex help you make your life easier and save some nerve cells.


All source code examples to study can be found here.


')

Source: https://habr.com/ru/post/327524/


All Articles