📜 ⬆️ ⬇️

Meet the Windows pseudo console (ConPTY)

Article published August 2, 2018

This is the second article about the Windows command line, where we will discuss the new Windows pseudo-console interface and software interfaces, that is, Windows Pseudo Console (ConPTY): why we developed it, what it is for, how it works, how to use it, and much more.

In the last article “The heavy legacy of the past. Windows command line problems ” we talked about the prerequisites for the emergence of the terminal and the evolution of the command line in Windows, and also began to study the internal structure of the Windows Console and the Windows Command-Line infrastructure. We also discussed the many advantages and major disadvantages of the Windows console.
')
One of the drawbacks is that Windows tries to be “useful”, but it prevents developers of alternative and third-party consoles, developers of services, etc. When creating a console or service, developers need to have access to communication channels through which their terminal / service communicates with command-line applications, or provide access to them. In the * NIX world, this is not a problem, because * NIX provides a “pseudo-terminal” (PTY) infrastructure that makes it easy to create communication channels for a console or service. But in Windows this was not ...

… until now!

From TTY to PTY


Before discussing our development in detail, let's briefly return to the development of terminals.

TTY was first


As discussed in the previous article , in the early days of computing, users controlled computers using electromechanical teletypes (TTY) connected to a computer through some kind of serial communication channel (usually through a 20 mA current loop ).


Ken Thompson and Dennis Richie (standing) work on DEC PDP-11 by teletype (messages without an electronic display)

Terminal distribution


Teletypes were replaced by computerized terminals with electronic displays (usually CRT screens). As a rule, terminals are very simple devices (hence the term “stupid terminal”), containing only electronics and computing power necessary for the following tasks:

  1. Receive text input from the keyboard.
  2. Buffering of the entered text on one line (including local editing before sending).
  3. Sending / receiving text over a serial channel (usually via the once-wide RS-232 interface ).
  4. Display of the received text on the terminal display.

Despite the simplicity (or perhaps because of it), terminals quickly became the main tool for managing minicomputers, mainframes and servers: most data entry operators, computer operators, system administrators, scientists, researchers, software developers and industry luminaries worked on DEC terminals, IBM, Wyse and many others.


Admiral Grace Hopper in his office with a DEC VT220 terminal on the table

Distribution of software terminals


Since the mid-1980s, instead of specialized terminals, general-purpose computers began to be used, which became more accessible, popular, and powerful. Many early PCs and other computers of the 1980s had terminal applications that opened a connection over an RS-232 port on a PC and communicated with anyone at the other end of the connection.

As general-purpose computers became more sophisticated, a graphical user interface (GUI) and a whole new world of simultaneously running applications, including terminal applications, appeared.

But the problem arose: how can a terminal application interact with another command line application running on the same machine? And how to physically connect a serial cable between two applications running on the same computer?

The emergence of a pseudo terminal (PTY)


In the world of * NIX, the problem was solved by introducing a pseudo-terminal (PTY) .

PTY emulates serial telecommunications equipment in a computer by exposing the master and slave pseudo-devices (“master” and “slave”): terminal applications connect to the master pseudo-device, and command-line applications (for example, shells like cmd, PowerShell, and bash) to the slave pseudo-device. When a terminal client transmits text and / or control commands (encoded as text) to the master pseudo-device, the text is translated to the associated slave. The text from the application is sent to the slave pseudo-device, then back to the master and, thus, to the terminal. Data is always sent / received asynchronously.


Pseudo-terminal application / shell

It is important to note that the “slave” pseudo-device emulates the behavior of a physical terminal and converts the command characters into POSIX signals. For example, if the user enters CTRL + C into the terminal, then the ASCII value for CTRL + C (0x03) is sent through the master device. When received at the slave pseudo-device, the value 0x03 is removed from the input stream and a SIGINT signal is generated.

Such a PTY infrastructure is widely used by * NIX terminal applications, text panel managers (for example, screen, tmux), etc. Application data calls openpty() , which returns a pair of file descriptors (fd) for the PTY master and slave devices. The application can then fork / run a child command line application (for example, bash), which uses its slave fd to listen and return text to the connected terminal.

This mechanism allows terminal applications to "talk" directly with command-line applications running locally, as the terminal would talk to a remote computer via a serial / network connection.

What, there is no pseudo-console Windows?


As we discussed in the previous article, while the Windows console is conceptually similar to the traditional * NIX terminal, it differs in several key ways, especially at the lowest levels that can cause problems for developers of Windows command line applications, third-party terminals / consoles, and server applications:

  1. There is no PTY infrastructure in Windows : when a user starts a command line application (for example, Cmd, PowerShell, wsl, ipconfig, etc.), Windows itself “connects” a new or existing console instance to the application.
  2. Windows interferes with third-party consoles and server applications : Windows (for the time being) does not give terminals a way to provide communication channels through which they want to interact with a command line application. Third-party terminals have to create a console off-screen, send user-entered data and scrap the output, redrawing it on the third-party console's own display!
  3. Only in Windows is the Console API : Windows command line applications rely on the Win32 Consol API, which reduces code portability, since all other platforms support the text / VT, and not the API.
  4. Non-standard remote access : dependence of command line applications on Consol API significantly complicates interaction and remote access scripts.

What to do?


Many, many developers have often requested a PTY-like mechanism under Windows, especially those who work with ConEmu / Cmder, Console2 / ConsoleZ, Hyper, VSCode, Visual Studio, WSL, Docker and OpenSSH tools.

Even Peter Bright, the technology editor of Ars Technica, asked to implement the PTY mechanism a few days later, as I began working on the Console team:



And recently again:



Well, we finally did it: we created a pseudo console for Windows :

Welcome to the Windows pseudo console (ConPTY)


Since the formation of the Console Team about four years ago, the group has been engaged in a major overhaul of the Windows console and internal mechanisms of the command line. In doing so, we regularly and thoroughly considered the issues described above and many other related issues and problems. But the infrastructure and code were not ready to make the release of the pseudo-consoli possible ... until now!

New Windows pseudo-console infrastructure (ConPTY), API and some other relevant changes will eliminate / facilitate a whole class of problems ... without breaking backward compatibility with existing command line applications !

The new Win32 ConPTY API (official documentation will be published soon) are now available in the latest Windows 10 Insider builds and the corresponding Windows 10 Insider Preview SDK . They will appear in the next major release of Windows 10 (somewhere in autumn / winter 2018).

ConHost Console Architecture


To understand ConPTY, you need to study the architecture of the Windows console, or rather ... ConHost!

It is important to understand that although ConHost implements everything you see and know as a Windows Console application, but ConHost also contains and implements most of the Windows command line infrastructure! From now on, ConHost becomes a real “console node” , supporting all command line applications and / or GUI applications that interact with command line applications!

How? Why? What? Let's take a closer look.

Here is a high-level view of the internal console / ConHost architecture:



Compared to the architecture from the previous article , ConHost now contains several additional modules for processing VT and a new module ConPTY that implements open APIs:


Ok, but what does that really mean?

How do Windows command line applications work?


To better understand the impact of the new ConPTY infrastructure, let's look at how Windows console and command-line applications have worked so far.

Whenever a user starts a command-line application, such as Cmd, PowerShell, or ssh, Windows creates a new Win32 process into which it loads the executable binary of the application and any dependencies (resources or libraries).

The newly created process usually inherits the stdin and stdout descriptors from its parent. If the parent process was a Windows GUI process, then the stdin and stdout descriptors are missing, so Windows will deploy and attach the new application to the new console instance. Communication between command line applications and their console is transmitted via ConDrv.

For example, when starting from a PowerShell instance without elevated rights, a new application process will inherit the parent stdin / stdout descriptors and, therefore, receive input data and output the output data to the same console as the parent.

We need to make a little reservation here, because in some cases command-line applications are launched attached to a new console instance, especially for security reasons, but the description above is usually correct.

Ultimately, when a command-line / shell application is launched, Windows connects it to the console instance (ConHost.exe) via ConDrv:



How does ConHost work?


Whenever a command line application is executed, Windows connects the application to a new or existing instance of ConHost. An application and its console instance are connected through the kernel-mode console driver (ConDrv), which sends / receives IOCTL messages containing serialized API call requests and / or text data.

Historically, as stated in the previous article, the work of ConHost is relatively simple today:


When a command line application calls the Windows Console API, API calls are serialized into IOCTL messages and sent via the ConDrv driver. It then delivers the IOCTL messages to the attached console, which decodes and makes the requested API call. Returned / output values ​​are serialized back to the IOCTL message and sent back to the application via ConDrv.

ConHost: a contribution to the past for the sake of the future


Microsoft tries to maintain backward compatibility with existing applications and tools whenever possible. Especially for the command line. In fact, 32-bit versions of Windows 10 can still run many / most 16-bit Win16 applications and executables!

As mentioned above, one of the key roles of ConHost is to provide services to its command-line applications, especially legacy applications that call and rely on the Win32 console API. ConHost now also offers new services:


Below is an example of how a modern console application communicates with a command line application via ConPTY ConHost.



In this new model:

  1. Console:
    1. Creates own communication channels
    2. Calls the ConPTY API to create a ConPTY, forcing Windows to start an instance of ConHost connected to the other end of the channels.
    3. Creates an instance of a command line application (for example, PowerShell) connected to ConHost, as usual
  2. ConHost:
    1. Reads UTF-8 text / VT at the input and converts it to INPUT_RECORD entries that are sent to the command line application.
    2. Performs API calls from a command line application that can modify the contents of the output buffer.
    3. Displays changes in the output buffer encoded in UTF-8 (text / VT) and sends the resulting text to its console.
  3. Command line application:
    1. It works as usual, reads input data and calls the Console API, having no idea what its ConPTY ConHost translates input and output from / to UTF-8!

The last moment is important! When an old command-line application uses calls to the Console API like WriteConsoleOutput(...) , the specified text is written to the corresponding ConHost output buffer. Periodically, ConHost displays the modified output buffer areas as text / VT, which is sent back to the console via stdout.

In the end, even traditional command-line applications from the outside “speak” with the text / VT without any changes !

Using the new ConPTY infrastructure, third-party consoles can now directly interact with modern and traditional command-line applications and exchange data with all of them in the text / VT.

Remote interaction with Windows command line applications


The mechanism described above works fine on a single computer, but also helps in interacting, for example, with a PowerShell instance on a remote Windows computer or in a container.

When you run the command line application remotely (that is, on remote computers, servers, or in containers), there is a problem. The point is that command-line applications on remote machines communicate with the local ConHost instance, because IOCTL messages are not intended to be transmitted over the network. How to transfer input from the local console to a remote machine and how to get output from the application running there? Moreover, what to do with Mac and Linux machines, where there are terminals, but no Windows-compatible consoles?

Thus, in order to remotely control a Windows machine, we need some kind of communication broker that can transparently serialize data across the network, control the lifetime of the application instance, etc.

Maybe something like ssh ?

Fortunately, OpenSSH recently ported to Windows and added Windows 10 as an additional option . PowerShell Core also uses ssh as one of the supported protocols for remote interaction PowerShell Core Remoting . And for those who worked in Windows PowerShell, remote interaction Windows PowerShell Remoting is still an acceptable option.

Let's take a look at how OpenSSH for Windows now allows you to remotely control Windows shells and Windows command line applications:



Currently, OpenSSH includes some undesirable complications:

  1. User:
    1. Starts the ssh client, and Windows connects the console instance as usual.
    2. Enters text into the console that sends keystrokes to the ssh client
  2. ssh client:
    1. Reads input as bytes of text data.
    2. Sends text data over the network to the sshd listening service.
  3. The sshd service goes through several stages:
    1. Runs a default shell (for example, Cmd) that causes Windows to create and mount a new console instance.
    2. Finds and connects to the Cmd instance console.
    3. Moves the console off-screen (and / or hides it)
    4. Sends input from an ssh client to an off-screen console as input.
  4. The cmd instance works as always:
    1. Collects input from sshd service
    2. Performs work
    3. Causes the Console API to output / style text, move the cursor, etc.
  5. Attached [offscreen] console:
    1. Performs API calls, updating the output buffer
  6. Sshd service:
    1. Squires off-screen console output buffer, finds differences, encodes them into text / VT and sends back ...
  7. The ssh client that sends the text ...
  8. The console that displays text

Fun, right? Not at all! In such a situation, much can go awry, especially in the process of simulating and sending user input and clearing the output buffer of the offscreen console. This leads to instability, malfunctions, data corruption, excessive energy consumption, etc. In addition, not all applications do the job of removing not only the text itself, but also its properties, due to which formatting and color are lost!

Remote operation using modern ConHost and ConPTY


Surely we can improve the situation? Yes, of course, we can - let's make a few architectural changes and apply our new ConPTY:



The diagram shows that the scheme has changed as follows:

  1. User:
    1. Starts the ssh client, and Windows connects the console instance as usual.
    2. Enters text into the console that sends keystrokes to the ssh client
  2. ssh client:
    1. Reads input as bytes of text data.
    2. Sends text data over the network to the sshd listening service.
  3. Sshd service:
    1. Creates stdin / stdout channels
    2. Calls the ConPTY API to initiate ConPTY
    3. Runs a Cmd instance connected to the other end of the ConPTY. Windows initiates and connects a new instance of ConHost
  4. The cmd instance works as always:
    1. Collects input from sshd service
    2. Performs work
    3. Causes the Console API to output / style text, move the cursor, etc.
  5. ConPTY ConHost instance:
    1. Performs API calls, updating the output buffer
    2. Displays the modified output buffer regions as text / VT in UTF-8 encoding, which is sent back to the console / terminal via ssh

This approach with ConPTY is clearly cleaner and easier for the sshd service. Windows Console API calls are made entirely in the ConHost instance of the command line application, which converts all visible changes to text / VT. Whoever connects to ConHost does not need to know that the application is calling the Console API there, and does not generate text / VT!

Agree that this new mechanism of remote interaction ConPTY leads to an elegant, consistent and simple architecture. Combined with the powerful features built into ConHost, support for older applications, and the display of changes from applications that invoke the console Console API as text / VT, the new ConHost and ConPTY infrastructure helps us move the past into the future.

ConPTY API and how to use it


The ConPTY API is available in the current version of the Windows 10 Insider Preview SDK .

By now, I’m sure that you’re looking forward to seeing some code;)

Take a look at the API declarations:

 // Creates a "Pseudo Console" (ConPTY). HRESULT WINAPI CreatePseudoConsole( _In_ COORD size, // ConPty Dimensions _In_ HANDLE hInput, // ConPty Input _In_ HANDLE hOutput, // ConPty Output _In_ DWORD dwFlags, // ConPty Flags _Out_ HPCON* phPC); // ConPty Reference // Resizes the given ConPTY to the specified size, in characters. HRESULT WINAPI ResizePseudoConsole(_In_ HPCON hPC, _In_ COORD size); // Closes the ConPTY and all associated handles. Client applications attached // to the ConPTY will also terminated. VOID WINAPI ClosePseudoConsole(_In_ HPCON hPC); 

The above API ConPTY essentially exposes three new functions for use:

Source: https://habr.com/ru/post/420853/


All Articles