Asynchronous programming - async performance: understand the costs of async and await

This article is quite ancient, but not lost its relevance. When talking about async / await, as a rule, a link to it appears. I could not find the translation into Russian, I decided to help someone who is not fluent.

Asynchronous programming for a long time was the realm of the most experienced developers with a burden to masochism - those who had enough free time, addiction and mental abilities to think about callbacks from callbacks in a non-linear flow of execution. With the advent of Microsoft .NET Framework 4.5, C # and Visual Basic brought asynchrony to all of us, so that mere mortals can now write asynchronous methods almost as easily as synchronous ones. Callbacks are no longer needed. Explicit transmission (marshaling) of the code from one synchronization context to another is no longer necessary. You no longer have to worry about how the results of an implementation or an exception move. There is no need for tricks that distort the means of programming languages for the convenience of developing asynchronous code. In short, there is no more hassle or headache.

Of course, despite the fact that it is now easy to start writing asynchronous methods (see the articles by Eric Lippert [ Eric Lippert ] and Mads Torgerssen [ Mads Torgersen ] in this issue of MSDN Magazine [OCTOBER 2011] ), to really do this requires understanding what happens under the hood. Every time a language or library raises a level of abstraction that a developer can use, this is inevitably accompanied by hidden costs that degrade performance. In many cases, these costs are negligible, so they can be neglected in most cases with most programmers. However, advanced developers should fully understand what costs are present in order to take the necessary measures and solve possible problems if they manifest themselves. This is required when using asynchronous programming tools in C # and Visual Basic.

In this article, I will describe the ins and outs of asynchronous methods, describe how asynchronous methods are implemented, and discuss some of the smaller costs. Note that these are not recommendations to distort readable code into something that is difficult to maintain, in the name of micro-optimization and performance. It’s just knowledge that will help diagnose problems you may encounter and a set of tools to overcome these problems. In addition, this article is based on the .NET Framework version 4.5 preview, and the specific implementation details are likely to change in the final release.

Get a comfortable thinking model.

For decades, programmers have been using high-level C #, Visual Basic, F #, and C ++ programming languages to develop productive applications. This experience allowed programmers to evaluate the costs of various operations and to gain knowledge of the best development techniques. For example, in most cases, the call to the synchronous method is relatively economical, especially if the compiler can embed the content of the called method directly at the calling point. Therefore, developers are accustomed to breaking code into small, maintainable methods, without having to worry about the negative consequences of increasing the number of calls. The thinking model of these programmers is designed to operate on method calls.

With the advent of asynchronous methods, a new model of thinking is required. C # and Visual Basic with their compilers are able to create the illusion that the asynchronous method works as its synchronous counterpart, although everything inside is completely different. The compiler generates for the programmer a huge amount of code, very similar to the standard template, which the developers wrote to support asynchrony during what they needed to do with their hands. Moreover, the code that the compiler generated contains calls to the .NET Framework library functions, further reducing the amount of work the programmer needs to perform. To have the correct thinking model and use it to make informed decisions, it is important to understand what the compiler generates for you.

More methods, fewer calls

When working with synchronous code, executing methods with empty content is practically worthless. For asynchronous methods this is not the case. Consider such an asynchronous method consisting of one instruction (and which, due to the absence of await operators, will be executed synchronously):

public static async Task SimpleBodyAsync() { Console.WriteLine("Hello, Async World!"); }

The intermediate language decompiler (IL) will reveal the true contents of this function after compilation, displaying something similar to Figure 1. What was a simple one-liner turned into two methods, one of which belongs to the auxiliary class of the state machine. The first is a stub method that has a signature similar to that written by a programmer (this method has the same name, the same scope, it takes the same parameters and returns the same type), but does not contain code written by the programmer. It contains only the standard boilerplate for initial configuration. The initial configuration code initializes the state machine necessary to represent the asynchronous method and starts it using the MoveNext service method call. The type object of the state machine contains a variable with the execution state of the asynchronous method, allowing it to be saved when switching between asynchronous waiting points, if necessary. It also contains the code that the programmer wrote, modified to ensure the transfer of execution results and exceptions to the returned Task object; holding the current position in the method so that execution can continue from this position after resuming, etc.

Figure 1 Asynchronous Method Pattern

 [DebuggerStepThrough] public static Task SimpleBodyAsync() { <SimpleBodyAsync>d__0 d__ = new <SimpleBodyAsync>d__0(); d__.<>t__builder = AsyncTaskMethodBuilder.Create(); d__.MoveNext(); return d__.<>t__builder.Task; } [CompilerGenerated] [StructLayout(LayoutKind.Sequential)] private struct <SimpleBodyAsync>d__0 : <>t__IStateMachine { private int <>1__state; public AsyncTaskMethodBuilder <>t__builder; public Action <>t__MoveNextDelegate; public void MoveNext() { try { if (this.<>1__state == -1) return; Console.WriteLine("Hello, Async World!"); } catch (Exception e) { this.<>1__state = -1; this.<>t__builder.SetException(e); return; } this.<>1__state = -1; this.<>t__builder.SetResult(); } ... }

When you estimate how much the calls to asynchronous methods cost, remember this pattern. The try / catch block in the MoveNext method is needed to prevent a possible attempt to embed this JIT method by the compiler, so at least we will get the cost of calling the method, while using the synchronous method, most likely, there will not be any call (provided that minimalistic content). We will receive several calls to the Framework procedures (for example, SetResult). As well as several write operations to the fields of the state machine object. Of course, we need to compare all these costs with the costs for Console.WriteLine, which will probably prevail (they include the costs for locks, I / O, etc.). Pay attention to the optimizations that the environment does for you. For example, a state machine object is implemented as a structure (struct). This structure will be boxed in a managed heap only if the method needs to suspend execution, waiting for the operation to finish, and this will never happen in this simple method. So the template of this asynchronous method will not require heap allocation. The compiler and runtime will try to minimize the number of memory allocations.

When not to use Async

The .NET Framework attempts to generate efficient implementations for asynchronous methods using various optimization methods. Nevertheless, developers, based on their experience, often use their optimization methods, which can be risky and inappropriate for automation by the compiler and the runtime, as they try to use universal approaches. If you do not forget about this, avoiding the use of async methods is beneficial in a number of specific cases, in particular, it concerns methods in libraries that can be used with more advanced settings. This usually happens when it is known that the method can be executed synchronously, since the data on which it depends is already ready.

When creating asynchronous methods, the .NET Framework developers spent a lot of time optimizing the number of memory management operations. This is necessary because the memory management has the highest cost in the performance of the asynchronous infrastructure. The operation of allocating memory for an object is usually relatively inexpensive. Allocating memory for objects is similar to filling a cart in a supermarket — you don’t waste anything when you put them in a cart. Spending occurs when you pay at the checkout, taking out the wallet and giving decent money. And if memory allocation happens easily, subsequent garbage collection can hit the application performance a lot. When you start garbage collection, scanning and marking of objects that are currently located in memory, but do not have links, takes place. The more objects are placed, the more time it takes to mark them. In addition, the larger the number of large objects placed, the more often garbage collection is required. This aspect of working with memory produces a global impact on the system: the more debris produced by asynchronous methods, the slower the application runs, even if micro tests do not demonstrate significant performance.

For asynchronous methods that suspend their execution (waiting for data that is not yet ready), the environment must create an object of type Task, which will be returned from the method, since this object serves as a unique reference to the call. However, often calls to asynchronous methods can be made without pausing. Then the runtime environment can return the previously completed Task object from the cache, which is used again and again without the need to create new Task objects. True, this is allowed only under certain conditions, for example, when an asynchronous method returns a non-generic Task object, or when a universal Task specifies a reference TResult, but returns null from the method. Although the list of these conditions expands over time, it is still better if you know how the executing operation is implemented.

Consider an implementation of this type as a MemoryStream. MemoryStream is inherited from Stream, and overrides new methods implemented in .NET 4.5: ReadAsync, WriteAsync and FlushAsync, in order to provide a MemoryStream-specific code optimization. Since the read operation is performed from the buffer allocated in memory, that is, it is actually a copy of the memory area, the best performance will be if ReadAsync is executed in synchronous mode. Implementing this in an asynchronous method might look like this:

 public override async Task<int> ReadAsync(byte [] buffer, int offset, int count, CancellationToken cancellationToken) { cancellationToken.ThrowIfCancellationRequested(); return this.Read(buffer, offset, count); }

Simple enough. And since Read is a synchronous call, and there are no await statements in the method for managing expectations, all calls to this ReadAsync will actually be executed synchronously. Now let's consider the standard case of using streams, for example, the copy operation:

 byte [] buffer = new byte[0x1000]; int numRead; while((numRead = await source.ReadAsync(buffer, 0, buffer.Length)) > 0) { await source.WriteAsync(buffer, 0, numRead); }

Please note that in the above example, the ReadAsync of the source stream is always called with the same buffer length parameter, which means it is very likely that the return value (the number of bytes read) will also be repeated. With the exception of some rare circumstances, a ReadAsync implementation is unlikely to use the cached Task object as a return value, but you can do this.

Consider another implementation of this method, shown in Figure 2. Using the advantages of its inherent aspects in standard scenarios for this method, we can optimize the implementation with the exception of memory allocation operations, which can hardly be expected from the runtime environment. We can completely eliminate the memory allocation loss by re-returning the same Task object that was used in the previous ReadAsync call, if the same number of bytes were read. And for such a low-level operation, which is likely to be very fast and will be called repeatedly, this optimization will have a significant effect, especially in the number of garbage collections.

Figure 2 Task Creation Optimization

 private Task<int> m_lastTask; public override Task<int> ReadAsync(byte [] buffer, int offset, int count, CancellationToken cancellationToken) { if (cancellationToken.IsCancellationRequested) { var tcs = new TaskCompletionSource<int>(); tcs.SetCanceled(); return tcs.Task; } try { int numRead = this.Read(buffer, offset, count); return m_lastTask != null && numRead == m_lastTask.Result ? m_lastTask : (m_lastTask = Task.FromResult(numRead)); } catch(Exception e) { var tcs = new TaskCompletionSource<int>(); tcs.SetException(e); return tcs.Task; } }

A similar method of optimization by eliminating unnecessary creation of Task objects can be used if caching is needed. Consider a method that is designed to get the content of a web page and its caching for subsequent hits. In the form of an asynchronous method, this can be written as follows (using the System.Net.Http.dll library that is new for .NET 4.5):

 private static ConcurrentDictionary<string,string> s_urlToContents; public static async Task<string> GetContentsAsync(string url) { string contents; if (!s_urlToContents.TryGetValue(url, out contents)) { var response = await new HttpClient().GetAsync(url); contents = response.EnsureSuccessStatusCode().Content.ReadAsString(); s_urlToContents.TryAdd(url, contents); } return contents; }

This is a realization in the forehead. And for calls to GetContentsAsync that do not find the data in the cache, the overhead of creating a new Task object can be neglected compared to the cost of getting data over the network. However, in the case of receiving data from the cache, these costs become significant if you simply wrap up and give away the available local data.

To eliminate these costs (if you need to achieve high performance), you can rewrite the method as shown in Figure 3. Now we have two methods: a synchronous public method and an asynchronous private method that is delegated to the open one. The Dictionary collection now caches the created Task objects, not their contents, so future attempts to retrieve the content of the page that has already been successfully retrieved before can be performed by simply calling the collection to return an existing Task object. Inside, you can take advantage of using the ContinueWith methods of the Task object, which allows us to save the executed object to the collection, if the page load was successful. Of course, this code is more complex and requires great effort in development and support, as is usual in optimizing performance: I don’t want to spend time writing it until performance testing shows that these complications lead to its improvement, impressive and obvious. What will be the improvements actually depends on the method of application. You can take a test suite that simulates common use cases and evaluate the results to determine if the game is worth the candle.

Figure 3 Manually caching tasks

 private static ConcurrentDictionary<string,Task<string>> s_urlToContents; public static Task<string> GetContentsAsync(string url) { Task<string> contents; if (!s_urlToContents.TryGetValue(url, out contents)) { contents = GetContentsInternalAsync(url); contents.ContinueWith(delegate { s_urlToContents.TryAdd(url, contents); }, CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion | TaskContinuatOptions.ExecuteSynchronously, TaskScheduler.Default); } return contents; } private static async Task<string> GetContentsInternalAsync(string url) { var response = await new HttpClient().GetAsync(url); return response.EnsureSuccessStatusCode().Content.ReadAsString(); }

Another optimization method associated with Task objects is to determine whether such an object should be returned from the asynchronous method. Both C # and Visual Basic support asynchronous methods that return a null value (void), and Task objects are not created in them at all. Asynchronous methods in libraries should always return Task and Task, since when developing a library you cannot know that they will not be used while waiting for completion. However, when developing applications, methods that return void can find their place. The main reason for the existence of such methods is to provide existing event-driven environments, such as ASP.NET and the Windows Presentation Foundation (WPF). Using async and await, these methods make it easy to implement button handlers, page load events, and so on. If you are going to use an asynchronous method with void, be careful with exception handling: exceptions to it will pop up in any SynchronizationContext that was active at the time the method was called.

Do not forget the context

There are many different contexts in the .NET Framework: LogicalCallContext, SynchronizationContext, HostExecutionContext, SecurityContext, ExecutionContext, and others (a huge number may suggest that the creators of the Framework were financially motivated to create new contexts, but I know for sure that this is not true). Some of these contexts strongly influence asynchronous methods, not only in terms of functionality, but also in performance.

SynchronizationContext SynchronizationContext plays a significant role for asynchronous methods. A “synchronization context” is just an abstraction for marshaling a delegate call with the specifics of a particular library or environment. For example, WPF has a DispatcherSynchronizationContext for representing a user interface (UI) flow for a Dispatcher: sending a delegate to this synchronization context causes the delegate to be queued for execution by the Dispatcher in its flow. ASP.NET provides AspNetSynchronizationContext, which is used to ensure that asynchronous operations involved in processing an ASP.NET request are consistently performed and linked to the correct HttpContext state. Well, etc. In general, there are about 10 SynchronizationContext specializations in the .NET Framework, some open, some internal.

When you wait for Tasks or other types of objects for which the .NET Framework can do this, objects that are waiting for them (for example, TaskAwaiter) capture the current SynchronizationContext at the moment when the wait (await) begins. After the wait is complete, if the SynchronizationContext has been captured, the continuation of the asynchronous method is sent to this synchronization context. Because of this, programmers who write asynchronous methods that are called from the UI stream do not need to manually marshal calls back to the UI stream in order to update the UI controls: this marshaling framework performs automatically.

Unfortunately, this marshaling has its price. For application developers who use await to implement their control flow, automatic marshaling is the right solution. Libraries often have a different story. For application developers, this marshaling is basically necessary in order for the code to control the context in which it runs, for example, to access the UI controls or to access the HttpContext corresponding to the desired ASP.NET request. However, libraries are generally not required to meet this requirement. As a result, automatic marshaling often brings completely unnecessary additional costs. Let's look again at the code that copies data from one stream to another:

 byte [] buffer = new byte[0x1000]; int numRead; while((numRead = await source.ReadAsync(buffer, 0, buffer.Length)) > 0) { await source.WriteAsync(buffer, 0, numRead); }

If this copy is invoked from a UI thread, each read and write operation will cause the execution to be returned back to the UI stream. In the case of a megabyte of data at the source and streams that read and write asynchronously (that is, most of their implementations), this means about 500 switches from the background thread to the UI stream. To handle this behavior in the Task and Task types, the ConfigureAwait method has been created. This method takes the continueOnCapturedContext parameter of a boolean type that controls marshaling. If true (default), await automatically returns control to the captured SynchronizationContext. If false is used, the synchronization context will be ignored, and the environment will continue to perform an asynchronous operation on the thread where it was interrupted. The implementation of this logic will give a more efficient version of the copy code between the threads:

 byte [] buffer = new byte[0x1000]; int numRead; while((numRead = await source.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false)) > 0) { await source.WriteAsync(buffer, 0, numRead).ConfigureAwait(false); }

For library developers, such acceleration is in itself sufficient to always think about using ConfigureAwait, except for rare conditions in which the library knows enough about the execution environment and it will need to execute a method with access to the correct context.

In addition to performance, there is another reason for which you need to use ConfigureAwait when developing libraries. Imagine that the CopyStreamToStreamAsync method, implemented with a version of the code without ConfigureAwait, is called from the UI stream in WPF, for example, like this:

 private void button1_Click(object sender, EventArgs args) { Stream src = …, dst = …; Task t = CopyStreamToStreamAsync(src, dst); t.Wait(); // deadlock! }

In this case, the programmer had to write button1_Click as an asynchronous method, in which the Task is expected to be executed by the await operator, and not use the synchronous Wait method of this object. The Wait method should be used in many other cases, but it will almost always be a mistake to use it to wait in the UI thread as shown here. The Wait method will not return until the Task is completed. In the case of CopyStreamToStreamAsync, its asynchronous stream tries to return execution with sending data to the captured SynchronizationContext, and cannot complete until such sendings are executed (because they are necessary to continue its work). But these sendings, in turn, cannot be executed, because the UI thread that is supposed to process them is blocked by the Wait call. This is a cyclic dependency, leading to a deadlock. If CopyStreamToStreamAsync is implemented with ConfigureAwait (false), there will be no dependency and blocking.

ExecutionContext ExecutionContext is an important part of the .NET Framework, but still, most programmers are blissfully unaware of its existence. ExecutionContext – , SecurityContext LogicalCallContext, , . , ThreadPool.QueueUserWorkItem, Task.Run, Delegate.BeginInvoke, Stream.BeginRead, WebClient.DownloadStringAsync Framework, ExecutionContext ExecutionContext.Run ( ). , , ThreadPool.QueueUserWorkItem, Windows (identity), WaitCallback. , Task.Run LogicalCallContext, LogicalCallContext Action. ExecutionContext .

Framework , ExecutionContext, , . Windows LogicalCallContext . (WindowsIdentity.Impersonate CallContext.LogicalSetData) .

. C# Visual Basic , . await. , , - . C# Visual Basic («») , await (boxed) , .

. , . , , , .

C# Visual Basic , . ,

 public static async Task FooAsync() { var dto = DateTimeOffset.Now; var dt = dto.DateTime; await Task.Yield(); Console.WriteLine(dt); }

dto await, . , , - dto:

Figure 4

 [StructLayout(LayoutKind.Sequential), CompilerGenerated] private struct <FooAsync>d__0 : <>t__IStateMachine { private int <>1__state; public AsyncTaskMethodBuilder <>t__builder; public Action <>t__MoveNextDelegate; public DateTimeOffset <dto>5__1; public DateTime <dt>5__2; private object <>t__stack; private object <>t__awaiter; public void MoveNext(); [DebuggerHidden] public void <>t__SetMoveNextDelegate(Action param0); }

, . , , , , . , :

 public static async Task FooAsync() { var dt = DateTimeOffset.Now.DateTime; await Task.Yield(); Console.WriteLine(dt); }

, .NET (GC) , , , : 0, , , (.NET GC 0, 1 2). , GC . , , , , , , . 0, , , . , , , .

( , ). JIT , , , , . , , . , , , , . , , . , C# Visual Basic , , .

C# Visual Basic , awaits: . await , Task , , . , , :

 public static async Task<int> SumAsync(Task<int> a, Task<int> b, Task<int> c) { return Sum(await a, await b, await c); } private static int Sum(int a, int b, int c) { return a + b + c; }

C# “await b” Sum. await, Sum, - async , «» await. , await . , , CLR, , , . , <>t__stack. , , Tuple<int, int> <>__stack. , , , . , SumAsync :

 public static async Task<int> SumAsync(Task<int> a, Task<int> b, Task<int> c) { int ra = await a; int rb = await b; int rc = await c; return Sum(ra, rb, rc); }

, ra, rb rc, . , : . , , , . , , , , .

, , . Sum , await , . , await , . await , Task.WhenAll:

 public static async Task<int> SumAsync(Task<int> a, Task<int> b, Task<int> c) { int [] results = await Task.WhenAll(a, b, c); return Sum(results[0], results[1], results[2]); }

Task.WhenAll Task<TResult[]>, , , , . . , WhenAll, Task Task. , , , , , WhenAll , . WhenAll, , , params, . , , . Figure 5

Figure 5

 public static Task<int> SumAsync(Task<int> a, Task<int> b, Task<int> c) { return (a.Status == TaskStatus.RanToCompletion && b.Status == TaskStatus.RanToCompletion && c.Status == TaskStatus.RanToCompletion) ? Task.FromResult(Sum(a.Result, b.Result, c.Result)) : SumAsyncInternal(a, b, c); } private static async Task<int> SumAsyncInternal(Task<int> a, Task<int> b, Task<int> c) { await Task.WhenAll((Task)a, b, c).ConfigureAwait(false); return Sum(a.Result, b.Result, c.Result); }

, . , . , . , , : , , / , . .NET Framework , . , .NET Framework, . , , Framework, , , .

Source: https://habr.com/ru/post/458332/

All Articles

Asynchronous programming - async performance: understand the costs of async and await

Get a comfortable thinking model.

More methods, fewer calls

When not to use Async

Do not forget the context

More articles: