Proper performance evaluation for diagnosing and fixing issues with .Net serialization

Dear readers! I present to you the translation of the article by Scott Hanselman entitled " Proper benchmarking to solve and. A serial serial bottle bottle ".

For starters, a few reservations and comments. First, the process of evaluating performance is complicated. Difficult to measure. But the real problem is that we often forget, FOR WHAT we evaluate the performance of something. We take a complex multi-machine financial system and suddenly we are extremely focused on a piece of code that performs serialization, which we believe is the IS. "If I can optimize this serialization by writing a for-loop from 10,000 iterations and reducing its execution time by x milliseconds, everything will be the way."

Secondly, it is not a post with performance comparison results. Do not link to it and do not say “see! The X library is better than the Y library. Or .Net is better than Java! ”Instead, consider it as an instructive story, as well as a set of general recommendations. I simply use this story to emphasize the following:

Do you understand 100% what you measure?
Did you run a profiler like Visual Studio profiler, ANTS or .dotTrace?
Do you consider warming up time? Do you drop sharply different measurement values? Are your results statistically significant?
Are the libraries you use optimized for your usage scenario? Are you sure you know what your usage scenario is?

One bad performance score

One reader recently sent me an e-mail with questions about serialization in .Net. The guys read one very old post of 2009 about performance, which included graphs and charts, and independently conducted some tests. They recorded that the serialization time (tens of thousands of elements) is more than 700 milliseconds, and the volume is about 2 megabytes. The test serialized their typical data structures, both in C # and Java, using a set of different libraries. Among the libraries was their company's own serializer, binary .Net DataContract, as well as JSON.NET. In one case, serialization gave a small amount of data (1.8 MB for a large structure), in the other - it worked quickly (94 ms.), But there was no obvious winner. The reader was on the verge of loss of reason and decided, in a sense, that .Net should not be used to solve their problem.

In my opinion, there is something wrong with such a performance evaluation. It is not clear what was measured. It is not clear whether the measurements were reliable, and more specifically, the universal conclusion that “.Net is slow” was not reasonable, given the data presented.

Hmm ... So .Net cannot serialize tens of thousands of data structures quickly? I know what I can.

See also: Create benchmarks and Results that Responsible benchmarking by @Kellabyte

I'm not an expert, but I still played a little with this code.

First, are we measuring correctly?

The tests used DateTime.UtcNow, which is not advised to use in such cases.

startTime = DateTime.UtcNow; resultData = TestSerialization(foo); endTime = DateTime.UtcNow;

Do not use DateTime.Now or DateTime.Utc to measure where any accuracy is needed. DateTime does not have sufficient accuracy and returns the time with an error of up to 30ms .

DateTime represents the date and time. This is not a high-precision timer or stopwatch.

As Eric Lippert says :

In short, “how much time?” And “how long did it last?” Are completely different questions; Do not use a tool designed to answer one question to answer another.

And as Raymond Chen says :

Accuracy (precision) is not the same as accuracy. Reliability is how close you are to the right answer; accuracy - how high the resolution of the response is.

So, we will use Stopwatch where we need a stopwatch. Before I transferred the example to Stopwatch, I received values in milliseconds like 90,106,103,165,94, and after transferring to Stopwatch the results were 99.94.95.95.94. Fluctuations of values have become significantly less.

 Stopwatch sw = new Stopwatch(); sw.Start(); // stuff sw.Stop();

Also, you may need to bind a process to a single processor core if you are trying to get a reliable estimate of performance. While it shouldn't matter and Stopwatch uses Win32 QueryPerformanceCounter (the source code for Stopwatch in .Net is here ), some problems occurred on older systems if the test started on one processor and ended on another .

 // One Core var p = Process.GetCurrentProcess(); p.ProcessorAffinity = (IntPtr)1;

If you are not using Stopwatch, look for a simple and well-suited library for assessing performance.

Second: we count the results

In the sample code that was given to me, about 10 lines contained the actual measurements, and 735 lines - the “infrastructure”, which was responsible for collecting and displaying the obtained data. Perhaps you have already seen this? It is fair to say that the performance evaluation may be lost in the “infrastructure”.

Listen to my recent podcast with Matt Warren on " Performance as a Feature " and take a look at the Matt's performance blog , and be sure to use the Ben Watson book called " Writing High Performance .NET Code ".

Also keep in mind that Matt is currently experimenting with creating a compact infrastructure for evaluating performance on GitHub. This system is quite promising and could reduce the evaluation process to applying the [Benchmark] attribute directly inside the unit tests .

Consider using existing infrastructures for simple performance evaluations. One of them is SimpleSpeedTester by Yan Cui . She makes nice signs and does a lot of tedious work for you. Here are the screenshots I ~~stole~~ lent on the Yan blog.

A slightly more advanced tool worth looking at is the HdrHistogram , a library “designed to record histograms of measured values in applications that are sensitive to latency and performance.” It is also on GitHub and includes implementations in Java, C, and C #.

And seriously. Use a profiler.

Third: did you run the profiler?

Use the Visual Studio Profiler, or download the Redgate ANTS Performance Profiler trial or JetBrains dotTrace profiler .

What does our application spend time on? I think we all met people who did complex tests and studied the work of the black box instead of just running the profiler.

By the way: are there any newer / suitable / studied ways to solve the problem?

This is my opinion, but I think it deserves attention and there are numbers that prove it. Part of the code that performs serialization in .Net is quite old, written in 2003 or 2005, and may not take advantage of new technologies and knowledge. In addition, it is a rather flexible, “suitable for all” code, unlike very highly specialized code.

People have different needs associated with serialization. You cannot serialize something in XML and expect the result to be small and compact. Similarly, you cannot serialize a structure in JSON and wait for it to be as fast as with binary serialization.

Measure your code, analyze your requirements, take a step back and consider all the options.

Fourth: New .Net serializers worth considering

Now that I have an understanding of what is happening and how to measure elapsed time, it became clear that all those serializers did not meet the goals of our reader. Some, as I said, were written long ago. So what are the more advanced and modern options?

There are two really good serializers to watch out for. This is Jil from Kevin Montrose , and protobuf-net from Marc Gravell . Both are awesome, and the breadth of the frameworks supported and the protobuf-net building system are just lovely. There are also other impressive serializers that are included in ServiceStack.NET and include support not only for JSON, but also for JSV and CSV.

Protobuf-net - protocol buffers for .NET

Protocol buffers is a format for describing data structures from Google , and protobuf-net is a high-performance implementation of protocol buffers under .NET. Imagine that it is XML, only more compact and faster. In addition, with the possibility of cross-language serialization. Here is what is indicated on their website:

Protocol buffers have many advantages when serializing structured data compared to XML. They:
simpler
from 3 to 10 times less
20 to 100 times faster
more unambiguous
generate data access classes (data access classes) that are easier to use in the program code

It was easy to add. There are many ways to decorate your data structures, but essentially:

 var r = ProtoBuf.Serializer.Deserialize<List<DataItem>>(memInStream);

The numbers I received with protobuf-net were exceptional, and in this case the data was packed tightly and quickly, taking only 49ms.

JIL - JSON serializer for .NET using Sigil

Jil is a Json-serializer, less flexible than Json.net, but this little sacrifice is offered to them in the name of speed. Here is what they say:

Flexibility and cool features are explicitly ignored in pursuit of speed.

It is also worth noting that some serializers work with a string in memory, while others, such as Json.NET and DataContractSerializer, work with the stream. This means that you should take into account the size of what you are going to serialize when you choose a library.

Jil is impressive to many, but especially because he dynamically emits a custom serializer (as XmlSerializers used to do).

Jil is extremely easy to use. It just works. I added it to the example and it serialized in 84ms.

 result = Jil.JSON.Deserialize<Foo>(jsonData);

Conclusion: Comparing Performance Is Not So Easy

What do you measure? Why do you measure it? Do your methods match your use cases? Do you serialize one large object or thousands of small ones?

James Newton-King brought me one beautiful thought:

"[There is] a meta-problem related to performance comparison. Micro-optimization and performance care when it doesn’t matter, this is what many developers sin. Documentation, developer productivity and flexibility are more important than a hundredth of a millisecond. "

James pointed out the old (but recently corrected) ASP.NET error on Twitter. This is an important error affecting performance, but it nevertheless fades in the light of the time it takes to transfer data over the network.

This error confirms the idea that many developers care about performance when it does not matter
- James Newton-King (@JamesNK) February 13, 2015

Thanks to Marc Gravell and James Newton-King for their help in preparing this post.

The original article is available at this link.

Source: https://habr.com/ru/post/258577/

All Articles