📜 ⬆️ ⬇️

Linux graphics stack

(original - Jasper St. Pierre, GNOME Shell developer, taken from here )

This is an overview article about the components of the Linux graphical stack and how they get along together. I originally wrote it for myself after talking about this stack with Owen Taylor, Ray Stroud and Eden Jackson ( Owen Taylor - Maintainer Gnome Shell; Ray Strode - Maintainer of many RedHat community desktop packs; Adam Jackson - developer of the Gnome Shell graphic stack and integration with XOrg; approx. Translator) .

I constantly pulled them, again and again asked about all small things, and then safely forgot these little things. In the end, I asked them a question - is there any review document, having buried in that I would spare the guys from their annoying attention? Not having received an affirmative answer, I decided to write this article, which, upon completion, was read by Edam Jackson and David Airlie. They both work on this stack.

I want to warn you, dear readers - such a device of a large part of the Linux graphics stack, as it is shown in this article, is true for open source drivers. This means that inside AMD or nVidia's proprietary drivers, things may be a bit wrong. Or not quite. Or not at all. They can have their own OpenGL implementations, as well as a copied Mesa implementation. I will describe the stack that is implemented in open source drivers “radeon”, “noveau” and drivers from Intel.
')
If you have any questions, or it seems to you that some details are not clear enough (well, or I am very, very mistaken or somehow confused, I am writing) - you, please, do not hesitate to write about it in the comments.

For starters, I will briefly write down all the components of the stack right here. Well, in order for you to take a general view of them in the course of the further description.

In general, to be precise, depending on the type of drawing you choose, there are two different scenarios for the development of events inside the stack:


3D rendering with OpenGL


  1. The program starts using OpenGL for drawing;
  2. The mesa library provides you with this very OpenGL API. It uses the specific drivers of your video card to convert OpenGL calls to a video-friendly look. If Gallium is used inside the driver, then the shared component is also connected, which turns OpenGL calls into a common intermediate representation, TGSI. After converting calls inside Gallium, the low-level driver needs only to translate TGSI into hardware commands that are understandable to the hardware;
  3. libdrm uses special ioctl to communicate with the Linux kernel;
  4. The Linux kernel (since it has the right to do this) can allocate memory for the video card both directly on the video card and in system memory;
  5. After all this, the mesa, at its level, uses DRI2 to communicate with Xorg to make sure that a buffer switch occurred, and window positions, etc. - synchronized.


2D drawing with cairo


  1. The program starts using cairo for drawing;
  2. You draw several gradient circles. Cairo splits circles into quadrilaterals and sends these quadrilaterals and gradients to the X-server using the XRender extension. In the event that the X server does not support the XRender, cairo renders them itself using libpixmap and uses another method to send the drawn pixel map to the X server;
  3. The X server accepts a request from XRender. Xorg can use different specialized drivers in this case:
    1) In the case of a rollback to software rendering or in the event that the driver is not ready, Xorg uses libpixman to draw it on its own, just like cairo;
    2) In the case of hardware acceleration, the Xorg driver communicates via libdrm with the kernel and sends textures and commands to the video card.


Well, and in order to get what is drawn on the screen, Xorg prepares the frame buffer using KMS and video card drivers.

X Window System, X11, Xlib, Xorg


X11 is not directly related to the graphics system - it includes a message delivery system, a description of the properties of windows and much more. In addition, a bunch of things that are not at all related to graphics (for example, the clipboard and drag-and-drop support) are implemented on top of X11. I am writing about X11 here only for a general understanding of its place within the X Window System. I hope someday it will be possible to write a separate post about the X Window System, X11 and their strange architecture.


I try to be as careful as possible in naming. If I write “X-server”, then I’m talking about the abstract server X - it could be Xorg, maybe the implementation of X’s server from Apple, and maybe Kdrive. No difference. If I write “X11” or “X Window System”, then this means that I mean the architecture of the protocol and the system as a whole. Well, if I write “Xorg” - then this is about the details of the implementation of Xorg, the most common X-server and in no way about the details of the implementation of any other X-servers. If you meet just “X” - this is a typo or cant.

X11 (the protocol itself) was designed to be extensible (that is, with the ability to add new features without creating a fundamentally new protocol and without losing backward compatibility with old clients). For example, xeyes and oclock look, so to say, "non-standard", because of the Shape Extension , which allows the use of non-rectangular windows. If you do not really understand what kind of magic this functionality comes from out of nowhere, then the answer is: magic has nothing to do with it. Expansion support must be added on both sides — at the client and at the server. In the basic specification of the protocol there is a special functionality for clients to receive information from the server about available extensions. Using this functionality, customers can decide what to use and what not.

The X11 architecture was designed to be transparent to the network environment. Simply put, we can not rely on the fact that the client and server parts (X-server and X-client) are on the same machine, so communication between them must be implemented via the network. In fact, modern desktop environments in the usual configuration do not work this way, because, apart from X11, all kinds of DBus participate in the interprocess interaction, for example. Work on the network connection is quite intense and generates a large amount of traffic. When the client and server parts of the X Window System are on the same machine, instead of communicating via a network connection, they communicate via a UNIX socket, and this greatly reduces the load on the kernel.

We will return to the X Window System and a number of its extensions a little bit later.

Cairo


Cairo is a library for drawing vector graphics, which is used both by ordinary applications directly (for example, Firefox) and by various toolkits such as GTK +. For example, the GTK + 3 rendering model is completely based on cairo. If you have worked with HTML <canvas> elements, then you have a fairly complete idea of ​​cairo, since, in fact, their API is similar. Despite the fact that <canvas> was originally introduced by Apple, vector graphics were born much earlier (starting from the PostScript rendering model, which is reflected in such standards and technologies as PDF, Flash, SVG, Direct2D, Quartz 2D, OpenVG and so on, thousands of them).

airo can draw on X11 surfaces through a special Xlib backend .

In GTK + 2 up to version 2.8, cairo was used as an optional component. Currently, GTK + 3 cairo is considered mandatory.

XRender extension


To support the rendering of anti-aliasing primitives in the X11 protocol, there is a special extension, XRender (the basic X11 protocol drawing operations do not use anti-aliasing). In addition to drawing primitives with anti-aliasing, this extension allows the use of gradients, matrix transformations, etc.

Initially, the rationale for entering such an extension into the protocol was an appeal to the fact that drivers may have special hardware accelerated methods of this rendering, which XRender will use.

Unfortunately, practice has shown that software rasterization (for quite obvious reasons) is in no way inferior to hardware speed. Anyway.

XRender works with aligned trapezoids - quadrangles with possibly non-parallel left and right sides. Karl Worth and Kate Pekkard developed a fairly quick program method for rendering these primitives. Equalized trapezoids have one more plus - trapeziums are easily representable as two triangles. This makes them easier to draw with iron. Cairo has a wonderful show-traps utility that demonstrates how the dissection of the primitives passed to the drawing on the trapezium occurs.



A simple example of a red circle. The circumference is divided into two sets of trapezoids - one for the contour and one for the fill. Since the implementation of show-traps shows this process uninformatively, I adjusted the source code of the utility so that each trapezium was painted with its own color. Here is an example of a set of trapezoids for drawing a black contour.



Psychedelic.

pixman


Both the X server and cairo need to work with pixels. Previously, Cairo and Xorg implemented work with rasterization, pixel-by-pixel access to various buffers (ARGB32, BGR24, RGB565), gradients, matrices and everything else in different ways. Now, both cairo and X-server do all this through the relatively low-level pixman library. Despite the fact that pixman is a shared library, it has neither a public API, nor a specific API for rendering operations. Strictly speaking, it does not have an API at all - it is simply a repository of code for its deduplication between the previously mentioned two components.

OpenGL, mesa, gallium


And this is the most fun part - modern hardware accelerated rendering. I believe that everyone already knows what OpenGL is. This is not a library, it is not even a specific set of sources for building libGL.so. Each vendor has its own libGL.so, one way or another compatible with the OpenGL specification.

For example, nVidia provides its own OpenGL implementation and its own libGL.so, which is implemented differently for the same Windows or OS X.

If you are using open source drivers, then your libGL.so implementation is most likely based on mesa . Mesa is a big pile of everything, but the main and most famous part of this heap is an open source implementation of OpenGL. Inside the mesa, behind the OpenGL API, various backends are used to translate the API into executive commands. There are three software backends:



In addition to software backends, mesa supports hardware:



In fact, gallium is a set of components, on top of which you can simply build a driver. The point is that the driver consists of:



Unfortunately, Intel developers do not use gallium. My colleagues say that this is due to the reluctance of developers of Intel driver to have any layers between mesa and their driver.

Few cuts


Further abbreviations will occur, for which I would not like to single out a separate big paragraph in the course of the narration. Therefore, I will simply list them here. Many of them have only historical value, but, nevertheless, I will write about them. So that you were aware.



Drivers Xorg, DRM, DRI


Earlier, I wrote that Xorg can produce hardware-accelerated rendering. I’ll add that this is not implemented through the translation of X11 drawing commands into OpenGL API calls. Then how does Xorg work with iron, if iron drivers work in the depths of mesa, and Xorg is not tied to mesa?

The answer is simple. After all, how is it? Mesa is responsible for implementing OpenGL, Xorg is responsible for implementing the rendering of X11 commands, and they both need to draw on hardware using iron-specific commands. At one time, a shared component was introduced into the Xorg and mesa architectures, which loads these commands into the kernel - the so-called “Direct Rendering Manager” ( “Direct Drawing Manager ”) or DRM .

Libdrm uses a set of original closed kernel ioctls to allocate graphics accelerator resources and provide it with commands with textures. The common interface of these ioctls is (in general, predictably) of two types:



There are no significant differences between them. They both do the same thing, just slightly different in implementation. Historically, GEM was introduced by Intel as a simple alternative to TTM. After some time, GEM has grown and “simplicity” has become the same as that of the TTM. So it goes.

Why all this? Besides that, for example, when you run a utility like glxgears, it loads mesa. Mesa downloads libdrm. Libdrm communicates with the kernel driver using GEM / TTM. Yes, glxgears works directly with the kernel in order to show you some spinning gears, thus reminding you that this is a benchmarking utility.
If you run the command in the console (substituting lib32 / lib64 depending on the architecture):

ls /usr/lib32/libdrm_*

you will see that there are hardware-dependent drivers. For those cases where the GEM / TTM functionality is not enough, the mesa and X server drivers provide an even more closed set of even more closed ioctls to communicate with the kernel, which, in fact, resides in these hardware-dependent drivers. Libdrm itself does not load these drivers.

The X server needs to know what is happening with the graphics subsystem in order to implement synchronization. This synchronization methodology (for example, between the glxgears you are running, the kernel and the X server) is called DRI or, more correctly, DRI2 . DRI stands for “Direct Rendering Infrastructure” ( “Direct Rendering Infrastructure ”) . In general, two things are understood as DRI:



Since we adhere to strict terminology, and DRI1 looks stupid, we will talk about the protocol and the library, calling them DRI2.

KMS


Since we have moved away a bit from the topic and started talking about infrastructural things, I will ask a question. Suppose you are working on a new X-server or you want to display graphics in a virtual terminal without using an X-server. How, then, will you do it?

You need to configure the hardware so that it can display graphics.

Inside, libdrm and the kernel have a special KMS subsystem that does just that. The KMS abbreviation stands for “Kernel Mode Setting” . Again, this subsystem through a set of ioctls allows you to set the graphics mode, set up the frame buffer and do everything you need to show the graphics directly in the TTY. Before the advent of KMS, the kernel was (yes, nowhere yet) a bitty set of ioctls, which, in fact, created a shared libkms library with a single and documented API to replace and standardize.

True, all of a sudden (as is customary in the Linux world) after libkms a new API appeared in the kernel, literally called “stupid ioctls”. Therefore, at present, it is recommended not to use libkms, but this set of ioctls.

Despite the fact that these ioctl'and very low-level and simple, they allow you to do almost everything. An example of this is plymouth, which in almost all modern Linux distributions is responsible for graphically displaying the boot process without starting the X server.

Model “Expose”, Redirection (redirection), TFP, Compositing (composition), AIGLX


One cannot speak of the term “compositional window manager“ without understanding what “composition” is and what the window manager does.

In the distant 80s, when the X Window System was developed for UNIX operating systems, a bunch of companies such as HP, DEC, Sun and SGI developed their products based on the X Window System. At the same time, the X11 protocol did not regulate the rules for managing windows and delegated responsibility for their behavior to a separate process, which was called the “window manager” (“window manager”) .

For example, CDE, a popular windowing environment for its time, professed the behavior of windows, which was called "focus follows the mouse." Its essence is that the input focus was transmitted to the window when the user hovers over it. This is different from the behavior of windows in Windows or Mac OS X, in which the focus of the window is transmitted by clicking.

As windowing environments began to gain popularity and become more complex, relevant documents began to emerge that regulate the general behavior of different windowing environments. True, these documents in the same way did not stipulate the policy of implementing the transfer of focus.

Again, in those distant 80s many systems had a banal lack of memory, so they could not store the entire contents of the window in pixel form. Both Windows and X11 solved this problem in the same way: each X11 window should not have a pixel state. If necessary, the application received a notification about the need to redraw a part of its window (produce “exposure”, “expose”).



Imagine such a set of windows. Now move the GIMP window:



The area shaded dark brown is exposed. The ExposeEvent event is sent to the application that owns the window and the application redraws the area of ​​the screen corresponding to the exposed one. It is because of this model that redrawing hung windows applications in Windows and Linux have white areas when you drag another window over them. Given the fact that on Windows the desktop is drawn with exactly the same program without special privileges, which can also hang up in the same way, you can easily understand the reasons for this fun artifact behavior .

Today computers have a lot of memory. Therefore, we can make windows that do not lose their pixel representation with X11. This is done using a mechanism called “ redirection ”. When we redirect a window, the X server creates pixel buffers to draw each window, rather than drawing directly to an offscreen screen frame buffer. This means that the contents of the window directly never appear on the screen. Something else is responsible for drawing pixels on an off-screen frame buffer.

The composition extension allows the composition window manager (or “ compositor ” 'y, “composer”) to create the so-called Composite Overlay Window (“composition window”) or COW . After that, the composer is appointed by the owner of the COW window and can draw it.

When you run Compiz or the GNOME Shell, these applications use OpenGL to display redirected windows on the screen. The X-server allows them to use the contents of the windows using the GL-extension “ Texture from Pixmap ” (“texture from pixel map”) or TFP . This extension allows the OpenGL application to use X11 pixel maps as if it were native OpenGL textures.

Composite window managers, in principle, may not use TFP or OpenGL. Just TFP and OpenGL - the easiest way to make a composition. Nothing prevents a window manager from simply drawing pixel maps of windows on COW using standard tools. I was told that kwin4 is doing this directly using Qt for composition.

Composite window managers get a pixmap from an X server through TFP and draw it in the OpenGL scene in the right place, creating the illusion that you are working with a regular X11 window. It may seem silly to call it “illusion,” but you can be sure of the “illusion” of the composition if you use, for example, GNOME Shell. To do this, you can change the size and position of the existing windows by entering the GJS code in the looking glass:

global.get_window_actors().forEach(function(w) { w.scale_x = w.scale_y = 0.5; });

The illusion of the composition will disappear as soon as you realize that you are poking the mouse not into the window you wanted into, but into another. In order to return everything back, enter the same code in the looking glass, changing 0.5 to 1.0.

Now that you are aware of all these details, you can tell about another abbreviation - AIGLX . AIGLX stands for “Accelerated Indirect GLX” (“accelerated transit / indirect GLX”) . Since X11 is a network oriented protocol, OpenGL must be able to work through the network. When OpenGL is used in this mode, the mode is called the “indirect context”, as opposed to the standard mode “direct context”, in which OpenGL is used on the same machine. The sadness is that the network protocol for the transit context is appallingly incomplete and unstable.

In order to understand the compromise of AIGLX's architectural solutions, we must understand the problem that they were trying to solve: the need to make composition managers like Compiz really fast. At a time when the proprietary NVidia driver has its own kernel-level memory management interface, the open graphics stack does not have it. Therefore, a direct transfer of a pixelmap of a window in the form of a texture from an X-server to a graphical hardware would result in copying this pixelmap every time the window is updated. Wildly slow. Therefore, AIGLX was previously recognized as a temporary crutch for fast software implementation of OpenGL in order to avoid copying pixel maps during hardware acceleration. Well, since the scene that is drawn by Compiz, is usually not very complicated, it worked perfectly.

Despite a lot of praise and articles by Phoronix, AIGLX was never used for serious things - simply because we now have a normal DRI stack in which you can implement TFP without copying.

Now you should be clear that copying (or, more precisely, substitution) of the contents of a pixel map of a window so that it can be transferred to the drawing as an OpenGL texture is impossible without directly copying the data. Because of this, most of the window managers in the settings have a feature that allows you to disable redirection for windows that are deployed in full screen. Perhaps calling this “ unredirection “ is stupid, because as a result we get a window as it should be according to the logic of the X Window System. Yes, historically it is. But, in modern Linux, this state can hardly be called the normal state of a window. Why do you need forwarding? Yes, because in the expanded state, any window still completely closes the COW, so there is no need to carry out a complex composition and it can be turned off. This feature is needed in order to allow full-screen applications such as games and video players to work without additional copying of window data with a maximum refresh rate reaching 60 frames per second allowed.

Wayland


Above, we have isolated a fairly large piece of infrastructure from the monolithic architecture of the X. But the graphics subsystem is not all that has fallen out of the monolith over time: almost all processing of input devices moved to the core with the help of evdev, and support for hot-plugging devices back into udev.

The reason that the X Window System is living now is that all this time the community’s efforts have been focused on replacing it. This replacement is Xorg, with a large variety of extensions that provide the functionality necessary for a modern graphical environment. We can say that the classic X Window System - written off trash.

Entering the Wayland. Wayland reuses a very large amount of the infrastructure that we have created to replace the X Window System. The only controversial thing about the Wayland architecture is the opacity of the network and rendering protocols. On the other hand, in our time, the tremendous flexibility of the network protocol becomes unnecessary, because the lion's share of X-based functionality is already scattered in other services - for example, DBus. In fact, it is embarrassing to look at those hacks in the X Window System architecture that are made up of things like clipboard or Drag and Drop support solely for compatibility with X's network past.

As already mentioned, Wayland can use the entire stack described above in order to get a frame buffer for the monitor and start. Wayland retains a specific exchange protocol, but it is based solely on UNIX sockets and local resources. The biggest difference between Wayland and Xorg is that it does not start using / usr / bin / wayland and does not hang in memory as a separate process. It, in accordance with the spirit of the times and the requirements for modern desktop environments, links everything directly to the window manager processes. Such a window manager or, more precisely, “composer” in Wayland terminology, pulls events from the kernel using evdev, sets up a frame buffer using KMS or DRM and draws a picture on it using any graphic stack, including OpenGL. Despite the fact that the mention of such an adhesive layer immediately causes associations with tons of code (because a bunch of systems are scattered throughout the interaction), in fact, the order of the volume fits into two or three thousand lines of code. Seems like quite a lot? Imagine that only a small part of mutter, describing the mechanism of focusing and stacking windows and synchronizing them with the X server, is already four to five thousand lines of code.

Although Wayland has a reference library of protocol implementation and is strongly recommended to use it for both clients and composers, nothing prevents someone from writing the entire composer for Wayland in Python. Or on ruby. And implement the protocol in pure Python, without using libwayland.

Wayland clients communicate with the composer and request a buffer. The composer gives the buffer in which they can draw at least with the help of cairo, even with the help of OpenGL, at least independently. Then the composer himself decides what to do with this buffer - just show it like that, give it priority because of the persistence of the application or rotate it… um ... on the cube because we want to post a new video view on YouTube with the windows on the cube. Well, you understand.

In addition, the composer is responsible for entering and handling events. If you tried to run a piece of GJS code for the GNOME Shell, you were probably puzzled by the question “Why does the mouse work with untransformed windows”? Because we were affecting the display of the window, and not the window itself inside X11. The X server tracks the windows itself and hopes that the composition window manager displays them accordingly. If this is not the case, then you have to become puzzled, as in the case above.

Since the composer of Wayland works with evdev and gives events to windows, he knows much better where the windows are located, how they are displayed and can carry out all the necessary transformations on their own. Therefore, working with such a composer, we can not only rotate the windows on the cube, but also work with them directly on the cube .

findings


I often hear statements that the Xorg implementation is monolithic. The share of truth in such statements, of course, is present, but over time the truth in such statements is less and less. This is not a result of the incompetence of Xorg developers, no. We just have to live not only with Xorg, but with all the baggage accumulated over many years - and this, for example, the hardware-accelerated XRender protocol or, if we take something earlier, - drawing commands without antialiasing like XPolyFill. It is clear that in time X will leave the stage and will be replaced by Wayland. But I want it to be clear that this is done with understanding and tremendous help from the developers of desktop environments and Xorg. They are not stubborn and they are not incompetent. Damn it, maintaining the thirty-year-old protocol without breakdowns and rebuilding its architecture is an excellent job for them.

I also want to express my gratitude to everyone who worked on the things about which this article. Many thanks to Owen Taylor, Ray Strowd and Edema Jackson for their patience and answers to all my stupid questions. Special thanks to Dave Airlie and Edema Jackson for help with technical proofreading of this article.

Despite the fact that I had a glimpse of the main things in the Linux graphics stack, you can always dig deeper if you're interested. For example, you can read about the geometric algorithms and theories that underlie the cairo partition of primitives into quadrangles. Or, it may be worthwhile to look at the algorithm for fast software rasterization of these quadrangles and to understand the reasons for its high speed. Try to dig into DRI2. What if you are interested in the hardware itself and how it draws, and you figure out the datasheets and try to program it yourself? In any case, if you decide to go deep into any of these areas, the community and projects listed above will be happy to receive you with your input.

I plan to write more about all this. Linux uses many different stacks of technology, and the GNOME community still does not have sane review documents describing them at a more or less high level.

Thanks to the users of Roosso , trollsid and Xitsa for reading.

Source: https://habr.com/ru/post/148954/


All Articles