The important truth about asynchrony in its original form: there is no flow.
Those who argue, innumerable. “No,” they shout, “if I am expecting an operation, there must be a flow that is waiting! Perhaps this is a thread from the pool. Or the flow of the operating system! Or something related to the device driver ... "
Do not heed these cries. If the operation is truly asynchronous, then there is no thread.
')
Skeptics are not convinced. Let us ridicule them.
We trace the execution of an asynchronous operation down to hardware, paying particular attention to the .NET platform and the device driver. We will have to simplify this description, omitting some of the details, but we will not go far from the truth.
Consider some “write” operation (to a file, network stream, USB toaster, anywhere). Our code is simple:
private async void Button_Click(object sender, RoutedEventArgs e) { byte[] data = ... await myDevice.WriteAsync(data, 0, data.Length); }
We already know that the UI thread is not blocked while waiting. Question: Is there another thread that sacrifices itself on the Lock Altar so that the UI thread can live?
Hold on. We will have to dive deep.
First stop: library (for example, BCL code). We assume that WriteAsync is implemented using the standard asynchronous I / O system in the .NET framework, which is based on overlapped I / O *. So Win32 starts an overlapped I / O operation specifying the device DISCRIPTOR.
The OS, in turn, accesses the device driver and asks it to start a write operation. This request is an object that describes a write operation; such an object is called an I / O Request Packet (IRP).
The driver receives the IRP packet and instructs the device to start recording data. If the device supports the direct memory access (DMA) mode, then the execution of the command consists only in writing the buffer address to the device register. This is all a driver can do; it marks the IRP packet as “running” and returns control to the OS.

Here is the point: the device driver is not allowed to block control during the processing of an IRP packet. This means that if an IRP packet cannot be processed immediately, it must be processed asynchronously. This is true even for synchronous methods! At the device driver level, all (non-trivial) requests are asynchronous.
To quote Tom Wisdom , “Regardless of the type of I / O request, all I / O operations assigned to the driver by the application are performed asynchronously.”
With the IRP packet in the “running” status, the OS returns to the library, which returns the unfinished task to the button press handler, which in turn suspends the method execution, and the UI thread continues its execution.
We followed the request to the very abyss of the system, up to the physical device.
Now the write operation is in progress. How many threads do it?
Not at all.
Neither the device driver thread, the OS thread, nor the BCL stream, nor the thread from the pool perform this write operation.
There is no flow.Now let's follow the answer that follows from the possessions of demons back into the mortal world.
Some time after the start of the operation, the device ends the recording. And notifies the processor with an interrupt.
The driver interrupt handler is called. An interrupt is a processor level event, so any work that the processor was busy with is temporarily suspended, regardless of which thread was currently running. It can be considered that the interrupt handler “lends” the executing thread, however, I am of the opinion that interrupt handlers are executed at such a low level that the concept of “flow” does not exist; they are executed "under" all threads, so to speak.
If the interrupt handler is written correctly, then all it does is tell the device “Thank you for the interrupt” and put the Deferred Procedure Call (DPC) object in the queue.
When the processor finishes processing interrupts, it proceeds to deferred procedure calls. They are also performed at such a low level, that speaking about “streams” is not entirely correct; Like interrupt handlers, deferred procedure calls are executed directly on the CPU, “under” the flow control system.
DPC takes an IRP packet, which is a write request, and marks it as “completed.” However, this status exists only at the operating system level; The process has its own address space, and it also needs to be notified. Therefore, the OS creates a special kernel-level asynchronous procedure call (APC) object and places it in the queue of the thread that owns the DISCRIPTOR.
Since the / BCL library uses the standard overlapped I / O mechanism, it has already bound the handle to the I / O Completion Port, which is part of the thread pool. Therefore, to perform APC, a stream from the I / O thread pool ** is used, which notifies the task that it has been completed.
As the task captures the UI context, the asynchronous method resumes its execution not in the thread of the pool. Instead, the continuation of the method is queued for execution in the UI context, and the UI context resumes execution of the method when it reaches it.
So we see that while the operation was performed, there was no flow. When the operation was completed, different threads were used to quickly perform various tasks. The execution time of such tasks is from a millisecond (for example, performing APC in a pool thread) to a microsecond (for example, interrupt processing). But no thread was blocked waiting for the operation to complete.

The chain of execution that we have traced is “standard”, somewhat simplified. There are countless variations, but the essence remains the same.
The idea that "somewhere there must be a thread performing an asynchronous operation" is erroneous.
Free your mind. Do not try to find this "asynchronous stream" - it is impossible. Instead, be aware of the truth:
There is no flow.
* - I did not see an unambiguous translation into Russian of the term “overlapped I / O”. The term “asynchronous I / O” is close in meaning.
** - there are two thread pools in the CLR: a pool of worker threads and a pool of input / output streams (approx. Lane).