📜 ⬆️ ⬇️

“Monitoring .NET application performance: approaches and tools,” an interview with Dina Goldstein



Not always developed solution works with acceptable performance. Especially for the customer. And if the proposal to purchase additional memory and raise the system requirements does not work (I have never succeeded), I have to tackle optimization. And for this, we have not only StopWatch: about the tools that allow us to understand where to look, where to climb in the first place, what results to expect while working on the application performance, we talked to a beautiful girl, an excellent specialist and speaker of the DotNext 2016 Moscow conference - Dina Goldstein .

Dina is a senior software engineer at Aternity. Aternity is developing monitoring tools for millions of PCs and mobile devices. Dina works as a team responsible for the main mechanism for collecting data from various sources.
')

About monitoring solutions


- What are the ready solutions for monitoring? How popular and popular are they? What tasks solve?

- Monitoring tools can be divided into two main groups - embedded units and separate independent programs.

For the first category there are such infrastructures as ETW (Event Tracing for Windows) and performance counters. They are already preinstalled in Windows. You can use off-the-shelf data collection solutions or embed components into your tool through the .NET (or C ++) API. These solutions will certainly help to realize exactly what you want. But the main word here is to implement. Basically you have to do all the work yourself. Oh yes, you can always hook up the Windows API calls to get even more data, but this requires a special approach and increased attention.

Among the finished tools there are many products. And libraries that are embedded in your code, for example, New Relic, and fully prepared monitoring systems that do not require programmer intervention. For example, Aternity, where I just work.

- Sooner or later, there is a situation when there are not enough ready-made tools. And you have to write code in your product. Do the .NET Framework and language tools offer anything to simplify development?

- Yes, definitely. Performance counters, of course, have a convenient .NET API, and you can use ETW using a freely available NuGet package called TraceEvent , which is developed by Microsoft. Recently, by the way, the company has opened the source code. Now the repository is available on GitHub.

These were a few words about the general approach. Separate .NET-based frameworks (such as WCF, WPF and Entity Framework) have extension points where you can embed your code and collect the necessary data. Another interesting option is to use ClrMD. This is a set of debugging APIs in .NET (also open source, supported by Microsoft, available on GitHub and via NuGet ).

With ClrMD, you can get a call stack, examine a managed heap, and the like. I do not think it will be possible to do large-scale monitoring using only this tool, if only because the overhead is too high, but if you find a problem with using any other tool, you can get specific information from ClrMD. For example, if you find that the heap in the application is too large, you can determine which objects consume the most memory.

If you want to dive deeper into the Win32 API, then I'm afraid. NET will not be able to help much, and you will have to resort to C ++. For example, use Microsoft Detours .

You must not lose control.


- Our software product went into operation. Have we lost control or are there still ways to collect information? What is the best way to accomplish this?

- Not! You do not lose control. If we are talking about a server or cloud product, then, of course, you can use any monitoring solution that suits your needs, since the environment is under complete control.

If you are developing a desktop application, you can’t require users to install a monitoring framework. This means that it is necessary to build in the application the possibility of self-monitoring. Of course, you can use ready-made libraries here, for example, New Relic, or implement them yourself using the .NET API, which I have already mentioned.

To tell the truth, lately I’ve been trying to convince the manager to add processor load monitoring to our application, which will work using performance counters, and when a certain threshold is reached, start collecting statistics on the call stack using ETW.

- What measurement error is considered acceptable, and how can we achieve it?

- I believe that accuracy is not the main task. You clearly want to generally understand exactly how the product works. Monitoring a released product is not the same as debugging and profiling during development. There are always ways to get more data during operation, for example, using ClrMD, but you have to pay for high accuracy. You can, of course, do this at times, but I still don’t see the point of using it all the time. You can connect with a profiler or debugger for a short time when faced with performance problems. In the ideal case, monitoring should give a general idea, and when a problem is detected, you return to the office, reproduce it locally and examine it using specialized tools.

- Suppose you need to measure the performance of the method. But there is no certainty that third-party actions will not affect this, for example, a garbage collector. What are the pitfalls in the measurements and how to avoid them?

- It depends on what the performance of the method implies. If we are talking about "algorithmic" performance, then there is probably no need to measure during operation in a real environment, but you can locally write a test, stop garbage collection, disable the paging file, connect the laptop to power, and generally do everything you think necessary.

But in real life, we have no control over when these interruptions happen. What really can be done is to get information, when interruptions took place and how long they took. ETW provides information about the garbage collector, the network, disk accesses, and so on. Then you can analyze all this data together and come to a conclusion about performance. Specifically with the garbage collector, if there are critical places, you can use GC.TryStartNoGCRegion.

- Let's touch on a little pure theory. On large systems with large amounts of data there is a desire to use the achievements of mathematics. Are there many theories? How popular are statistical methods? Are they used in any tools?

“The tools I know only collect data.” And then everyone decides how to process this data. One of the features that should be taken into account is that sometimes sampling is used instead of Windows API interception, which is much more expensive in terms of performance. By their nature, discredit can lead to an underestimation of the number of rare events. This is especially evident when measuring the use of the processor, and you need to make sure that measurements are made long enough to statistically get information about even small actions. But on the other hand, these little things are clearly not what causes performance problems, and are probably not important at all if you want to get an idea of ​​what takes more time and which is less. And definitely it is impossible to get the exact time of execution with such measurements.

About monitoring at the system design stage


- Is it necessary to lay expansion points for future monitoring during system design? Are there established approaches and recommendations?

- Yes, definitely! I did not mention this before, but we can create our own performance counters and ETW logs. Therefore, during the development of the system, you should think about what places might be interesting and what data should be collected. Ideally, the system should be designed in such a way that DevOps can monitor everything they need without the intervention of programmers.

- Should we wait for help from development tools, such as Visual Studio?

- Here again depends on the environment. Of course, Visual Studio has excellent profiling tools during development. But do not wait for help during the operation of the application. First, because profiling has a large overhead. Secondly, you cannot install Visual Studio on users' computers, if only because of license restrictions. If there is a remote connection to the server and it is allowed to stop the application for a while, then you can use Visual Studio Remote Debugger. But this is still debugging, not monitoring.

- Many are faced with the need to process a huge amount of raw data obtained during monitoring. Is it possible to automate?

- Yes of course. If you are using ETW, you can also use TraceEvent and write some .NET code that will analyze events online or offline. If you are using a different data source and the result is in a standard format, you can use any preferred programming language for analysis. And, of course, in our time there is an incredible number of ready-made programs and platforms for analysis, which allow us to process data in a beautiful form and dynamically show it on the dashboard.

- What are the problems now facing monitoring tools? And in what direction is the development going?

- I think that the most pressing problems are overhead and overdose data. We cannot allow large performance drops due to monitoring. In addition, the user must be able to change what is monitored and the configuration dynamically, without recompiling or restarting the application. This is especially true for stock exchanges and the military.

Another thing that is particularly lacking under Windows is the ability to dynamically intercept any Windows API calls, something like eBFP for Linux. Currently, we are faced with an abundance of data, so control panels are becoming increasingly popular, which allow us to group and dynamically display information in accordance with constantly changing requirements.



Dina performs in the first section at the DotNext 2016 Moscow conference, which will be held on December 9 at the Radisson Slavyanskaya Hotel. You can register here .

In addition to the report of Dina, you can also listen to:

.NET Core: State of the art
Squeezing the Hardware to Make Performance Juice
Intellectual chatbots and cognitive services
Stack Overflow - It's all about performance!
Advanced Xamarin.Forms
C ++ via C #
⬝ We continue to talk about arithmetic
ASP.NET SignalR: Cruise for Web Development
Exceptional Exceptions in .NET
.NET code modification in runtime
End-to-end JIT
Performance tuning Stack Overflow tags
# C # Scripting - you can have thought of before!
Multithreading Deep Dive
Collect All, or Meet Cake (C # Make)
WinDbg Superpowers for .NET Developers
Overview of the new .NET Core and .NET Platform Standard
What vulnerabilities are found in the .NET platform and how not to repeat them in their applications
What's new in C # 7?

Source: https://habr.com/ru/post/313640/


All Articles