Vulkan API (glNext) from Khronos Group

Relatively recently, the new Vulkan API was released - one can say the heir to OpenGL, although Vulkan is based on the AMD Mantle API.
Of course, the development and support of OpenGL did not stop, and DirectX 12 was released as well. What about DirectX 12 and why it was installed only on Windows 10 - I, unfortunately (and maybe fortunately) do not know. But the cross-platform Vulkan interested me. What are the features of Vulkan and how to use it correctly, I will try to tell you in this article.

So what is Vulkan for and where can it be used? In games and applications that work with graphics? Of course! Calculate how does CUDA or OpenCL do it? No problem. Do we need a window or display for this? Of course not, you can specify for yourself where to broadcast your result or not to broadcast it at all. But first things first.

API design and basics

Perhaps we should start with the simplest. Since the Khronous Group worked on the Vulkan API, the syntax is very similar to OpenGL. There is a vk prefix in the whole API. For example, functions (sometimes even with very long names) look like this: vkDoSomething (...), the names of structures or handles: VkSomething, and all constant expressions (macros, macro calls and enumeration elements): VK_SOMETHING. Also, there is a special kind of functions - commands that add the prefix Cmd: vkCmdJustDoIt (...).
')
You can write on Vulkan both in C and in C ++. But the second option will give, of course, more convenience. There are (and will be created) ports to other languages. Someone has already made a port on Delphi, someone wants (why?) A port in Python.

So, how to create a render context? No Here it is not. Instead, it came up with other things with different names that would even resemble DirectX.

Getting started and basic concepts

Vulkan separates two concepts - a device ( device ) and a host ( host ). The device will execute all commands sent to it, and the host will send them. In fact, our application is the host - Vulkan has this terminology.

To work with Vulkan, we will need handles for its instance , and maybe not even one, but also the device , again, it may not always be enough for one.

Vulkan can be easily loaded dynamically. In the SDK (developed by LunarG ), if the VK_NO_PROTOTYPES macro was declared and the Vulkan library was uploaded with their own hands (not by linker, but by certain means in the code), first of all, the vkGetInstanceProcAddr function would be needed, which can be used to find out the addresses of the main Vulkan functions - those that work without an instance, including the function of its creation, and functions that work with the instance, including the function of its destruction and the function of creating a device. After creating a device, you can get the functions that work with it (as well as its child handles) via vkGetDeviceProcAddr.

Interesting fact: in Vulkan, you always need to fill in a certain structure with data to create an object. And everything in Vulkan works like this: prepared in advance - can be used often and with high performance. In the information about the instance, you can also put information about your application, engine version, version of the API used and other information.

Layers and extensions

In pure Vulkan there are no strong checks on incoming data for correctness. He was told to do something - he would do. Even if this results in an application, driver, or video card error. This is done for performance. However, you can easily connect test layers , as well as extensions to the instance and / or device, if necessary.

Layers (layers)

Basically, the purpose of the layers is to check incoming data for errors and monitor the work of Vulkan. They work very simply: let's say we call a function, and it gets into the topmost layer specified when creating the device or instance earlier. He checks everything for correctness, then transfers the call to the next one. And so it will be, until it comes to the core of Vulkan. Of course, you can create your own layers. For example, Steam released the SteamOverlay layer (although I don’t know what it does at all). Nevertheless, the layers will be silent, but they will not bring the application to ruin. How to find out if everything is done correctly? For this there is a special extension!

Extensions (extensions)

As the name implies, they extend the work of Vulkan with additional functionality. For example, one extension (debug report) will display errors (and not only) from all layers. To do this, you will need to specify the necessary Callback function, and it is up to you to decide what to do with the information received by this function. Please note that this is a Callback and a delay can cost you dearly, especially if you display all the information received directly to the console. After processing the message, you can specify whether to transfer the function call further (to the next layer) or not - this way you can avoid critical errors, but try to work further with less dangerous errors.
There are also other extensions, some of which I will discuss later in this article.

Device

Vulkan separates the concepts of physical device and logical. The physical device can be your video card (and not one) or a processor that supports graphics. The logical device is created on the basis of the physical: information is collected about the physical devices, the necessary is selected, other necessary information is prepared and the device is created. There may be several logical devices based on one physical one, but it is not possible to combine physical devices (yet?) For a single operation.

So, what kind of information we collect? These are, of course, the supported formats, memory, features, and, of course, the queue families.

Queues and queue families

The device can (or cannot) do the following 4 things: draw graphics, perform various calculations, copy data, and also work with sparse memory management. These features are represented as queue families: each family supports certain (maybe all at once) capabilities. And if identical families were divided, Vulkan would still present them as one family, so that we would not suffer so much with the code and choose the right family.

After you have selected the necessary (or necessary) families, you can get queues from them. Queues - this is the place where the commands for the device will arrive (then the device will be taken from the queues and executed). Queues and families, by the way, not much. NVIDIA usually has one family with all the features for 16 queues. Once you are done with the selection of families and the number of queues, you can create a device.

Commands, their execution and synchronization

All commands for the device are placed in a special container - the command buffer. Those. there is not a single function in Vulkan that would tell the device to do something right away and, at the end of the operation, return control to the application. There are only functions for filling the command buffer with certain commands (for example, draw something or copy an image). Only after recording the command buffer on the host can we send it to the queue, which, as we already know, is in the device.

The command buffer is of two types: primary and secondary. Primary goes straight to the queue. The secondary can not be sent - it starts in the primary. Commands are written in the same order in which the functions were called. In turn, they come in the same manner. But they can come true almost in a "chaotic" order. To avoid complete chaos in the application, Vulkan developers have provided a means of synchronization.

Now, the most important thing: the host does not wait for the completion of the execution of commands and command buffers. At least until you specify it in an explicit way. After sending the command buffers to the queue, control is immediately returned to the application.

There are 4 synchronization primitives: fence (fence), semaphore (semaphore), event (event) and barrier (barrier).

The fence is the easiest method of synchronization - it allows the host to expect to do certain things. For example, the completion of the command buffer. But the fence is rarely used.

Semaphore - a way to synchronize inside the device. You can’t see its status or wait for it on the host, you can’t also wait for it inside the command buffer, but we can specify which semaphore should give a signal when all the commands in the buffer have completed, and which semaphore to wait before starting the commands in the buffer. Only not all the buffer will wait, but its certain stage.

Pipeline stages and performance dependencies

As already mentioned, not necessarily the teams in the queue will be executed in order. To be more precise, subsequent teams will not wait for the completion of the previous ones. They can be executed in parallel, or the execution of the previous command can be completed much later than the subsequent ones. And this is quite normal. But some teams depend on the performance of others. You can divide them into two shores: “before” and “after”, and also indicate which stages of “before” shore must necessarily be executed (that is, commands may not complete completely or not all) before the indicated steps begin coast teams "after." For example, drawing an image may pause to do certain things, and then continue to draw again. There may also be a chain of dependencies, but we will not go deep into the ~~forests of Siberia~~ Vulkan.

Events - element of "fine" settings. You can signal both from the host and from the device; you can also wait on the device and on the host. The event defines the dependency of two sets of commands (before and after) in the command buffer. And for the event there is also a special pseudo-stage that allows the host to wait.

The barrier can again be used only in the device, and more precisely in the command buffer, declaring the dependencies of the first and second set of commands. You can also optionally specify memory barriers, which are of three types: global barrier, buffer barrier and image barrier. They will not inadvertently read the data that is currently being written and / or vice versa, depending on the specified parameters.

Conveyors

Below are two Vulkan pipelines:

Those. Vulkan has two pipelines: graphic and computing . With the help of graphic, we, of course, can draw, and computational ... calculate. What else? The results of the calculations can then go to the graphics pipeline. So you can easily save time on a particle system, for example.

You cannot change the order or change the pipeline stages themselves. The exceptions are programmable stages (shaders). You can also send various data to shaders (and not only) through descriptors.

For a pipeline, you can create a cache that can be used (over and over) in other pipelines and even after restarting the application.

The pipeline must be configured and associated with the command buffer before the latter will use pipeline commands.

Inheritance of conveyors

Since the pipeline is actually all the information about how to work with incoming data, changing the pipeline (and this is information about shaders, descriptors, rasterization, etc.) can be costly in time. Therefore, the developers have provided the opportunity to inherit the pipeline. If you change the pipeline to a child, the parent or between children, it will take less than the cost of performance. But it is also a convenience for developers, such as OOP.

Render pass, graphics pipeline and framebuffer

So, we get the following matryoshka:

In order to be able to use the drawing commands, you need a graphics pipeline. In the graphics pipeline, you must specify the Render Pass , which contains information about subpass , their dependencies on each other, and attachments . Attachment - information about the image that will be used in framebuffers. Framebuffer is created specifically for a particular rendering pass. To start the passage, you need to specify both the passage itself (as well as, if necessary, the subpass) and the framebuffer. After the start of the passage you can draw. You can also switch between subways. After drawing is complete, you can complete the pass.

Memory Management and Resources

The memory in Vulkan is shared by the host and only by the host (with the exception of swapchain). If the image (or other data) needs to be placed in the device, memory is allocated. First, a resource of a certain size is created, then its memory requirements are requested, memory is allocated for it, then the resource is associated with a section of this memory and only then the necessary data can be copied to this resource. Also, there is a memory that can be directly changed from the host (host visible), there is a local memory of the device (video card memory, for example) and other types of memory that affect the speed of access to them.

In Vulkan, you can also write your own host memory allocation by configuring the Callback function. But keep in mind that memory requirements are not only its size, but also alignment .

The resources themselves are of two types: buffers ( buffers ) and images ( images ). Both are divided by purpose, but if the buffer is just a collection of various data (vertex, index or constant buffer), then the image always has its own format.

Instruction to those who write on Vulkan

Allocate a section of memory in which you can put several resources at once. The amount of discharge is limited and you may not have enough. But the number of associations is not limited.

Shaders

Vulkan supports 6 types of shaders: vertex , tessellation control, tessellation analysis , geometric , fragmentary (aka pixel ) and computational . You can write them on the readable SPIR-V, and then assemble the code in the byte, which we will print to the module in the application, i.e. create a shader module from this code. Of course, we can write it in the usual GLSL and then convert it to SPIR-V (there is already a translator). And, of course, you can write your own translator and even an assembler - the source code and specifications are laid out in OpenSource, nothing prevents you from writing a collector for your High Level SPIR-V. Or maybe someone has already written.
The byte code is then translated into commands specific to each video card, but this is done much faster than from raw GLSL code. This practice is also used in DirectX - HLSL is first converted into byte code, and this byte code can be saved and then used to not compile the shaders again and again.

Windows and displays

And this article will end the story of WSI (Window System Integration) and the switching chain (swapchain). In order to display something in the window or on the screen - you need special extensions.

For windows, this is the base plane extension and the plane extension specific to each of the systems (win32, xlib, xcb, android, mir, wayland). For the display (i.e. FullScreen), the display extension is needed, but in general both of them use the swapchain extension.

The switching chain is not connected to the graphics pipeline, so a simple Clear Screen comes out without setting all this up. Everything is quite simple. There is a certain presentation engine, in which there is a queue of images. One image is displayed on the screen, others wait for their turn. The number of images we can also specify. There are also several modes that allow you to wait for the vertical sync signal.

The method of operation is approximately as follows: we request a free image index, call the command buffer, which copies the result from the Framebuffer to this image, and send a command to send the image to the queue. It sounds easy, but given the need for synchronization, everything is a little more complicated, since the only thing that the host expects is the image index, which will soon be available. The command buffer waits for the semaphore signal, which will indicate the availability of the image, and then send a signal through the semaphore that the buffer execution, and therefore copying, is completed. And the image will actually go to the queue at the signal of the last semaphore. There are only two semaphores: the availability of the image for copying and the availability of the image for display (i.e., the completion of copying).

By the way, I checked that the same command buffer actually went to the queue several times. You can think for yourself what that means.

In this article I tried to talk about the most important parts of the Vulkan API, but much is still not told and you can find out for yourself. Stable FPS and pleasant coding.

Source: https://habr.com/ru/post/283490/

All Articles