
Favorite games through Dolphin on a powerful computer work quite well. The game is played at full speed, there are no graphic glitches and any controller can be used. However, when you hit a new area or load a new effect, there is a very small, but noticeable, “hanging up”. If you turn off the frame limiter for checking, you can see that the game can run much faster than at full speed. What is going on?
Slowing down when loading new areas, effects, models and the rest is usually called by users and developers “hang when compiling shaders”. This problem was present in Dolphin from the very beginning, but only recently attracted attention.
')
When the games barely worked, there were of course light hang-ups, but they didn’t cause big problems. However, emulation in many games gradually improved to an almost perfect state, and the hangs remained unchanged for many years. Since the release of Dolphin 4.0, users have started complaining even more about freezing when compiling shaders. This was partly due to the increased requirements for the video processor due to integer mathematics, but mostly the hangup became noticeable because there were no other serious problems in the emulator.
Developers were suspicious and even disliked to hang when compiling shaders. The problem seemed intractable, in the community it caused pain and irritation. It's ironic that we hated hanging up like no other, but the sheer complexity of the task was scared off by most developers. Despite this, some of them continued to hold hope alone. The solution was born as a theory that was
likely to work. A theory that would require hundreds, if not thousands of man-hours
only to test the possibility of its realization.
It was this hope that pushed us towards a difficult journey with almost no chance of success. The journey, which took two years of work of several video processor specialists. And all this in order to emulate the entire path of the GameCube / Wii primary programmable pipeline
without this annoying hang.
It was the dawn of the Uberschade era.
Problem
Modern video processors are incredibly versatile, but universality comes at a price — they are
incredibly complex. To use all their power, developers use shaders — programs that the video processor executes in the same way as the central processor executes applications. They program the video processor to implement effects and complex rendering techniques. The developers write the shader code in the API (for example, in OpenGL), and the shader compiler in the video driver translates the code into binary commands that the video processor can execute. Such compilation requires computational resources and time, so in modern PC games this problem is solved by compiling at times when the frame rate is not important, for example, at boot time. Due to the large number of different computer video processors, games for PCs cannot pre-compile shaders for a specific video processor. Therefore, the only way to execute shaders on specific hardware is to compile them with a video driver at a certain point in the game.
Video GameCube Flipper, the largest chip on the motherboard. Source: AnandtechIn consoles everything is completely different. If you know exactly what equipment will run the game, and know that this equipment will never change, you can simply pre-compile the programs for the video processor and burn them to disk, speeding up the game loading and ensuring a constant speed of work. This is especially important on older consoles, which do not have enough memory, and it may not even be possible to store shaders in memory. The GameCube video processor called Flipper is just the case.
Flipper has elements with constant functions, so it used the programmable module TEV (Texture EnVironment, texture environment), which can be configured to perform a huge variety of effects and rendering techniques - almost the same way as slicable shaders do. In fact, the capabilities of the TEV module are very similar to the features of DirectX 8 pixel shaders on the Xbox! It was so versatile and powerful that Flipper with some modifications was used as the Wii video processor (already called Hollywood). Unfortunately for us, the TEV module is designed to perform TEV configurations in games directly at the moment effect is required. There is no preloading of TEV configurations, because there is no memory for this in the TEV module.
This instant download has become the source of all our problems. Dolphin must be able to translate every Flipper / Hollywood configuration used by the game into a specialized shader that could be executed by a computer video processor. Shaders need to be compiled, and this takes time. But the TEV module does not have the ability to save configurations, so GC / Wii games customize it so that it renders the effect as soon as it is needed, without any delays and notifications. To cope with this discrepancy, Dolphin can only postpone the video processor's stream while the video processor's stream and video driver compile, that is, in effect, suspending the operation of the emulated GC / Wii console. Usually, compilation is performed between frames and users do not notice this, but if it lasts longer than the frame, the game is noticeably suspended until the compilation is completed. This is the
hang when compiling shaders . Usually the hang-up lasts only a couple of frames, but in very loaded scenes with several compiled shaders, it can hang for
more than a second .
Until the cache of shaders is created, Metroid Prime 3 gameplay is quite painful (
GIF ).
Dolphin was the first emulator to emulate a system with a programmable video processor at full speed, so we had to solve this problem on our own. We implemented caching of shaders, that is, when you turn on any configuration a second time, it no longer hangs. But to create a reliable cache you would have to play the game for several hours, and replacing the video processor of the computer, updating the video driver, or even switching to a new version of Dolphin could lead to outdated cache and new hangs. For several years, it seemed to us that nothing could be done with the compilation of shaders, and many wondered if this was possible at all ...
Solving an unsolvable problem
Of all the remaining problems of Dolphin, users complained most about hang-ups when compiling shaders. When discussing in bug trackers, on forums, on social networks and IRC, this issue constantly popped up. After a few years, we began to react to it differently. At first, hanging was not even considered a bug. What is the significance of small inhibitions if the games barely work at all? Everything changed in January 2015, when the hangup was formally recognized as a problem in the Dolphin bug tracker and information about it began to spread.
In recent years, users have asked a lot of questions about the hangs, demanded a solution to the problem, declared the emulator useless, and not even scolded the developers because of the lack of attention to compiling shaders. The truth is that we hate these suspensions like no other, and we have been thinking about this issue for many years. Many solutions were invented, some of them were even tested. But it seemed that the task could not be solved without serious side effects.
Possible solutions
Generate all shaders in advance!
For reference: on Earth about 7.5 Ă— 10 15 grains of sand.Dolphin can quite quickly generate the shaders it needs, but the problem lies in the compilation. But if we could somehow generate and compile shaders for each possible configuration, this would solve the problem, right? Unfortunately, this is simply impossible.
There are approximately 5.64 Ă— 10
511 possible configurations of only one TEV module, and we would have to create a unique shader for each of the configurations. In addition, the system uses vertex shaders to emulate a semi-programmable Hardware Transform and Lighting unit, and they further increase the number of combinations.
Even if we could compile them, these shaders could only be used in the Dolphin version for which they were generated. When upgrading to a new build, a new set of shaders would be required. In other cases, for example, when replacing a video card or updating video drivers
, recompilation would
also be required. And all this would be possible if the driver has a low-level cache, which not all drivers have.
Predict the shaders the game will need!

If we were able to generate and compile shaders only on loading screens and in other similar cases, then the suspensions would be imperceptible. But to realize the prediction so that it solves the problem is simply impossible. The impact on the speed and complexity of the implementation of "predictions" fast forward and prediction of input data are too expensive for situations that they could help.
Blind predictions don't work either - the game can choose the running configurations and does not warn about it, and the previous configurations do not tell us anything about the following. The only way to find out what shaders the game needs is to go through the game and find every configuration that it may need.
... Which led us to another proposed solution.
Common shaders

To describe the configuration of the emulated video processor, Dolphin uses the object “Unique ID” (“UID”). These UIDs are turned into shader code and passed to the video driver for compilation. Since UIDs are assigned before compilation and are not customized for any particular computer video processor, they are compatible with any computer and theoretically they can be shared. Theoretically, if users share UID files, they will be able to compile shaders in advance and they will not hang up. Currently, the Vulkan API
already has this feature , which is necessary to avoid problems with caching shaders for some drivers.
So why this decision was never implemented?
- Dolphin continues to improve. When making graphical improvements, all these UIDs would have to be thrown away.
- Not all games can be so processed. Popular games would have an almost complete collection of UIDs, but we couldn’t help those who played in little-known masterpieces.
- When testing, it turned out that different games have very few common UIDs. The Legend of Zelda: The Wind Waker and The Legend of Zelda: Twilight Princess have a small amount of common configurations (15%), but they run on the same basic engine . Most games will have much less in common, so sharing information about popular games would certainly not help lesser known.
- Users may not have different UIDs. There is an almost infinite number of configurations. Even one hundred percent passing game does not guarantee that you use them all.
The developers weighed this decision for a while, but discussing the UID exchange infrastructure and finding a good way to distribute them created more controversy than solutions. This system could be used to
improve an already working solution, but it could not be one itself.
Asynchronous Shader Compilation
Asynchronous compilation of shaders , which has gained popularity due to fork, is a non-standard solution to the dilemma of compiling shaders.
Tino looked at the problem in almost the same way as some modern games solve the problem of dynamic compilation of new shaders. When a player appears in a new area, new objects sometimes just arise from nothing, that is, they load dynamically. He wondered if he could not get a similar result in the emulator, and in his fork began to rewrite the way the shaders were processed.
The concept of asynchronous shader compilation changed Dolphin's behavior when he did not find the cached shader for the detected Flipper / Hollywood configuration. Instead of pausing the game and waiting for the compiler to compile the shader, he simply missed rendering the object. This meant that there were no pauses and hangs, but some objects could be missing in the frame until their shader was
ready .
For some games, this method worked well. In part of the games, the engine cut off objects when rendering in such a way that objects outside the camera's field of view or covering only a few pixels of the screen were still rendered. In this case, the skip rendering of such objects was barely noticeable. However, in other games this led to the “out of nowhere” effect described above.
When skipping the compilation of shaders, objects could appear from the air, and the graphics looked broken (
GIF ). But the gameplay remained smooth!
Users asked the question: why did the Tino asynchronous shaders not be included in Dolphin at least as an option to solve the problem of suspensions when compiling shaders? It all came down to the fact that people who
could realize this function together with other major developers were against such a decision. They saw in him only a hack that would lead to a bunch of false positive reports in the bug tracker and the emergence of even greater problems in the future. In some ways they were right: it became clear that some games
need to render objects in the frame in which they are expected. In this case, Mii avatar heads were rendered only once into the Embedded Framebuffer (EFB). If a copy of EFB was missing due to the asynchronous compilation of shaders, then Mii heads were not displayed until the end of the game or until their regeneration.
Headless MiiDespite all the flaws, users of the Tino fork believed in the asynchronous compilation of shaders. Let asynchronous shaders cause problems, the main thing is that they
solved the problem of
hanging up when compiling shaders. Due to obvious flaws, it could not be merged with the Dolphin master branch, but this solution definitely underlined the seriousness of the problem with the compilation of the generated shaders. Tino's work on the asynchronous compiler of shaders clearly showed us how much this problem worries users, and even more motivated the team to find a better solution.
Decision
Write the interpreter of the GameCube / Wii rendering pipeline inside the shaders and run it in the computer video card
Sometimes the best way to solve an insoluble problem is to look at it from a different angle. Whatever we tried to do, there was no way to compile specialized shaders at the same speed as the game changes configurations.
But what if we do not rely on specialized shaders? We had a crazy idea - to emulate
the rendering pipeline itself using an interpreter, which is executed directly in the video processor as a set of huge universal shaders. If we compile these huge shaders when the game starts, then when the game changes the Flipper / Hollywood configuration for rendering effects, such “shaders” will
configure themselves and perform rendering without the need for new shaders. Theoretically, this will solve the problem with freezing when compiling shaders due to the
complete rejection of compilation.
This thought seemed insane, but it was the first to have the potential to solve this insoluble problem. The complexity of this decision lay in the absurd amount of work and knowledge to achieve at least the stage of testing its capabilities. For you to understand: even among all Dolphin developers, only two or three people
at best had knowledge not only about the GameCube / Wii hardware, but also about modern video processors, GPU, API
and drivers needed to write and optimize shaders. This is not to mention the fact that executing the interpreter as huge shaders is not a very simple task for the video processor. Many feared that the results of all this work could not be performed at full speed even on modern video cards.
To guarantee the winnings, it would take hundreds, if not thousands, of hours of numbing, monotonous, but difficult work.
The first attempt was made in 2015, when the
phire developer
was so tired of hanging on his powerful new computer that he made a proposal and developed a framework for the
superscheider . Although he was aware of all the difficulties, but he seemed determined to prove that the uberheders were the solution to our ancient problem. phire
alone tried to re-teach dolphin rendering.
This is not a graphic filter.
It seems there are a couple of glitches here ...
Due to its simplicity, the SM64 was one of the first games in which something was rendered via sinkers.After finishing this function for a month, he managed to bring pixel uber-shaders to the stage where some games looked almost the same as their versions on fast shaders. Surprisingly, it wasn’t that they work, but that the prototypes of Ubershaders allow you to play games at full speed. Phire himself recalls that his first reaction was:
nifiga itself, they really work at full speed . He admitted that
video processors did not have to cope with work at playable speed, but they succeeded . Against all expectations, prototypes have proved that warders can be the solution to the problem with freezing when compiling shaders. Therefore, with further improvements, we have improved the accuracy of the heathaders, corrected many errors and implemented the missing features.
At the very beginning, ubercasders turned the games into a picture of a distorted reality.
But the situation was quickly improving.
We did not have time to blink an eye, and Wind Waker has already become rendered with just a few errors.
phire quickly achieved the perfect rendering of wind waker. Unfortunately, other games with a wider list of features required much more work.Having brought the project of ubersheaders to this stage, phire was completely exhausted. Moreover, he still had a lot of work to debug other projects for the release of Dolphin 5.0. It turned out that the delays have their price - due to burnout and worries about the limitations of the drivers and the API phire lost all its fuse. Although approximately 90% was ready, 90% remained, including several important functions.
- Completing Vertex Bowers
- Infrastructural / connective pixel and vertex sinkers
- Resolving OpenGL performance issues and (after rebase) with Vulkan
- Cleaning up the code, correcting errors and getting the same rendering results as on specialized shaders
- GUI Options
- Advanced - hybrid mode for embedded and weak video processors
It was painful to see that this amount of work hung in uncertainty. But it was not possible to find developers able and willing to take on such a huge project. Even those who decided to work on it were not ready to clean up the code, correct errors and work on the infrastructure. For more than a year, the development of the uberschader was idle, the list of unfinished functions was constantly growing, and hope was gradually fading ...
Ubershadery 2.0
Suspensions when compiling Dolphin shaders were one of the most noticeable errors, so after completing the development of the ubershaders, users did not forget about them. The long-abandoned pool-request continued to be replenished with comments, they wrote about the problem on the forums and even stated it in various forms in a bug tracker.
The ubercheders remained the first real hope of eliminating the suspensions, and they surfaced monthly in discussions. The progress achieved only spurred community interest in the decision. After a lot of requests, complaints and even
blackmail, Stenzek reluctantly began to work on Ubersheaders.
Even before Stenzek took on the packers, the team made decisions regarding support for graphics APIs. One of the solutions, namely the rejection of the API D3D12, received mixed, if not negative reviews. Unlike the D3D9 solution, we didn’t want to go through the process of phasing out and got rid of it right away, as it became obvious that no one wants to support this API.
But this turned out to be a good solution, because getting rid of the API allowed us to revive the project of ubershaders when Stenzek was ready for this. He was the backend architect of Vulkan in Dolphin, so he wanted to do extra work to get the Ubercher to work with Vulkan.
When pixel and vertex packers were finally merged together and ready to launch, testers immediately used them in the most difficult games. Considering that none of the previous solutions worked normally for Metroid Prime 3, this game became the first candidate.
Metroid Prime 3 was one of the few games in which shader hangs lowered the rating to non-playable. Until recently (
gif )!
The first test of Uberchers was a huge success: the suspensions completely disappeared in D3D, and in OpenGL and Vulkan only some strange inhibitions appeared at the early stages. Continuing our work on the Ubirders, we have greatly improved their work in all APIs, with a few exceptions, which I will discuss later. But just running the game on Uberschad was not enough: they themselves ate up a large amount of computer video card resources. Of course, the requirements of different games are different, but usually the video card was strongly influenced by the resolution in which the game was launched. With native resolution of 1x (480p), most video cards coped, and more powerful cards could even work with resolutions of 1080p or higher, while using only supersheders. Unfortunately, many of our users did not have the equipment necessary to run the uberchers in the resolution they were used to. Therefore, they had to choose between resolution and smoothness of work.
Intel's integrated video processors barely cope with specialized Dolphin shaders at high resolutions, not to mention heat sinkers. (Click on the image to view statistics.)A
very large part of Dolphin users have built-in video processors in their computers. When testing embedded video processors,
at best, they gave in 3D-games with Ubershaders at a resolution of 1x, only about
50% of the speed! The developers realized that it would be a mistake to ignore a large part of Dolphin users and made the ubershade optional. Work continued on a search for a more reliable solution that could solve performance problems once and for all.
Hybrid Mode Udershader
The hybrid mode of uberdershaders is a combination of uberdersaders and asynchronous generation of shaders in one beautiful solution, which took the best from each approach, but got rid of their shortcomings. Since the hybrid mode greatly reduced the resource consumption of the packers, we expected it to be the most popular mode of packers.
In the hybrid mode, when a new configuration of the pipeline appears, Dolphin uses the already compiled websheaders to instantly render the effect without hanging, while continuing to compile a specialized shader in the background. After creating specialized shaders, Dolphin transfers the rendering of objects from the ubershader to these generated specialized shaders.
If we assume that the drivers and API will behave in the way we need, then this will be an excellent solution. Since heat shaders are performed only for part of the objects in the scene and not for all frames, the performance impact is almost imperceptible, and the suspensions are completely eliminated. Unfortunately, the drivers and APIs are not perfect, which limits the effectiveness of the hybrid mode on some machines. And that brings us to ...
Board of shame API and drivers for ubershader
: , . , API. - .
. , - /API, , , .
. , , . . , , .
, (, Mesa) , . . , , Vulkan Mesa .
NVIDIA OpenGL Vulkan
, OpenGL Vulkan ( ) . , , D3D, , NVIDIA, Dolphin. .
NVIDIA OpenGL Vulkan , D3D
, . OpenGL, Vulkan D3D, , D3D
. , GTX 760 OpenGL Vulkan 1x, D3D .
NVIDIA , ,
. , D3D. , : , Dolphin NVIDIA, . , . .
, — . : NVIDIA , Direct3D 12 ( ), API . , API .AMD Vulkan -
, ! AMD Vulkan ! , . , .
macOS -
, macOS, , « macOS...». Here she is. , OpenGL 4.1 macOS - .
, . : macOS - .
, , API .
. . , , , (anti-aliasing) , . , , .
- Intel Windows
- Hybrid D3D . Exclusive Mode ( ) , Intel , «» 1x.
- OpenGL .
- Vulkan Skylake , .
- Intel Linux
- Hybrid Vulkan . Exclusive Mode , .
- Anv .
- Intel i965 OpenGL . , , . Exclusive Mode , , Hybrid Mode .
,- AMD Windows
- Hybrid D3D .
- Exclusive Mode D3D Vulkan .
- OpenGL AMD .
- AMD Linux
- Exclusive Hybrid Vulkan .
- radv anv .
,- NVIDIA Windows
- Hybrid D3D OpenGL .
- Exclusive D3D , OpenGL Vulkan . D3D , OpenGL Vulkan, .
- NVIDIA Linux
- Hybrid OpenGL .
- Exclusive OpenGL Vulkan . , API . , Vulkan , .
- NVIDIA Android
- Hybrid OpenGL .
- Exclusive OpenGL Vulkan . Exclusive NVIDIA Shield TV .
,PowerVR Android- Not recommended. - , Hybrid Mode . .
Adreno Android- Not recommended. Hybrid Mode , Exclusive Mode . .
Mali AndroidFinally
. , . , . , , Exclusive Mode , . Vulkan , , Hybrid Mode . , , ,
, Dolphin
. , JIT, . JIT Dolphin , JIT (, N64 VC). ,
, . , , , , , .
- .
. , !