📜 ⬆️ ⬇️

Optimization of Android-games created on Unity for the Intel platform: an example from life

Unity is one of the most popular gaming engines for mobile platforms (Android and iOS). Many developers use it to create and release games. Before Unity supported Android devices built on the Intel platform, games had to be played in an emulator, which replaced the low-level ARM code with x86 code. As a result, some games that were not originally designed for the x86 platform did not start in the emulator, or had performance problems. Today, with the growing presence of Intel processors in a mobile environment, many developers are interested in supporting Android devices based on the x86 architecture. Developers want to know how to optimize games for such equipment.

In this article, we show how you can achieve performance by developing Android applications designed for the x86 platform, and we will share tips on optimizing games using the example of Hero Sky: Epic Guild Wars.


Game Hero Sky: Epic Guild Wars

The company Innospark , developer Hero Sky: Epic Guild Wars , has extensive experience in creating mobile games using various commercial game engines. In addition, Innospark has its own engine. Hero Sky: Epic Guild Wars is the first game of the company, developed using Unity and released on the world market. After being published on Google Play, with the growing number of downloads, the company began to face user complaints. On some Android devices based on the Intel platform, the game simply did not work, on others its performance left much to be desired. As a result, the company decided to port the game to the x86 platform and optimize it. Here we describe how Hero Sky: Epic Guild Wars was optimized using application profiling results using Intel Graphics Performance Analyzers (Intel GPA). In particular, let's talk about the impact on performance of the order of output of objects and alpha blending.
')

Preliminary Information


Hero Sky: Epic Guild Wars is an online military 3D strategy game. Innospark developed and optimized the game using a system based on the Intel Atom (Bay Trail). Here are the features of the reference device used in the tests. Here are the system characteristics and 3DMark test results. The device has an 8-inch screen.
Indicator
Characteristic
CPU
Intel Atom processor, Quad Core 1.46 Ghz
OS
Android 4.4.4
Ram
2 GB
Screen resolution
1920x1200
3DMarkICE Storm Unlimited Test Result
10386
Graphics
9274
Physics
17899
Here is a graph comparing the performance of native and emulated code on a device.


Performance gains that can be achieved with x86 support

Evaluating the test results, it is worth remembering that they can be optimized in the calculation, for example, only on microprocessors from Intel. Performance tests such as SYSMark and MobileMark are run on specific systems that have specific hardware components and software installed. During their course, certain sets of actions are performed. Any change can affect the test results. This concerns the software and hardware components of the device, as well as the test application itself and the test suite. Therefore, taking any decisions on the basis of tests, for example, about buying a device, try to find as many sources of information as possible, including what results are shown by the equipment you are interested in, working in conjunction with other equipment. Learn more about performance here .

After the game was ported for the x86 architecture, the processor load decreased by about 7.1%, FPS increased by 27.8%, and execution time decreased by about 32.6%. However, the load on the video core increased by 26.7% due to the increase in the frame rate.

At Innospark, Intel GPA was used to find bottlenecks in the performance of the CPU and video chip during the development process. Application analysis data was used to solve graphics problems and improve game performance.

At the beginning of the optimization, using the Intel GPA System Analyzer, the value taken as 51.09 FPS was taken as the baseline. The Graphics Frame Analyzer, which measures FPS only on the GPU side, yielded a value of 120.9 FPS. The reason why these values ​​differ is in the fact that System Analyzer monitors the process behavior in real time, which includes the work of the CPU and the work of the GPU. The Graphics Frame Analyzer tracks only the operation of the video core and the processor activity associated with sending data to the driver and the GPU.

Detailed application analysis using Graphics Frame Analyzer



Screen copy of the original version of the application

Immediately after transferring the game code to the x86 platform, it showed 59.01 FPS. The game was analyzed in detail using the Graphics Frame Analyzer in order to reduce the load on the video core (GPU Busy) and the processor (CPU Load). The table shows the information obtained using the Graphics Frame Analyzer.
Indicator
Value
Total number of primitives
4376
GPU running time
8.58 ms
The time required to display the frame
9.35 ms

This is how much load on the system was created by the original, non-optimized version of the game. Here are information about drawing calls that require the most system resources.
Type of
Erg's number
GPU Duration
Read from memory, GPU
Memory Record, GPU
Sky
one
1.43 ms
0.2 MB
7.6 MB
Land
five
1.89 ms
9.4 MB
8.2 MB

Analysis and optimization aimed at reducing the load on the system


Eliminate unnecessary alpha blending


When alpha blending is used when displaying objects, the program must combine, in real time, the color values ​​of all the overlapping objects and the background in order to figure out what the final color will be. Thus, outputting the colors resulting from alpha blending can put a greater load on the processor than the display of opaque colors. These extra computations can hurt performance on slow devices. Therefore, it was decided to get rid of unnecessary alpha blending.

Graphic Frame Analyzer can customize drawing commands. This gives the developer the opportunity to test the program and find out the changes in performance caused by the changes made without modifying the code. This function can be found on the Blend State tab, which is in the State group.


Here's how to enable and disable alpha blending in the Graphics Frame Analyzer without modifying the application source code.

The table shows more detailed information about drawing grass after turning off alpha blending. The duration of the GPU, as a result, decreased by 26.0%. In addition, note that the reading from memory decreased by 97.2%.
Indicator
Basic version
Disable unnecessary alpha blending (ground)
GPU cycles
1466843
1085794.5
GPU Duration
1896.6 ms
1398.4 µs
Read from memory, GPU
9.4 MB
0.2 MB
Memory Record, GPU
8.2 MB
8.2 MB

Effective Z-clipping application


When a 3D video card displays objects, 3D shapes from a three-dimensional space (x, y, z) are transformed into two-dimensional (their position is determined by x and y coordinates). In this case, the Z-buffer, or depth buffer, is used to store the depth information (about the z coordinate) of each screen pixel. If two scene objects are to be displayed on the same pixel, the GPU compares the depth information and overlaps the color of the current pixel with a new one if the new object is closer to the observer. The Z-clipping process faithfully reproduces the familiar perception of the depth of space, bringing closer objects first. They hide objects further down. Z-clipping allows you to improve performance when displaying hidden surfaces.

The game has two types of output environment: sky (erg number 1) and grass (erg number 5). Since most of the sky is behind the grass, a large area of ​​the sky will never be shown during the game. However, the sky is displayed first, which prevents the effective use of Z-clipping.


Challenges for drawing the sky (erg number 1) and grass (erg number 5)

Here is how the duration of the GPU works after changing the order of output of objects


Comparison of the load on the system before and after changing the order of output of objects in the Graphics Frame Analyzer.

The table shows more detailed information about drawing the sky after changing the order of output of objects. The running time of the GPU, in particular, has decreased by 88%. Pay attention to the fact that the amount of data stored in memory has decreased by about 98.9%.
Indicator
Basic version
Change drawing order (sky)
GPU cycles
1113276
133975
GPU Duration
1443 µs
174.2 ms
Early Z-Drop
0
2145344
Number of samples recorded
2165760
20416
Read from memory, GPU
0.2 MB
0.0 MB
Memory Record, GPU
8.2 MB
0.1 MB

results


The table shows more detailed results of optimizing the game for the x86 platform after eliminating unnecessary alpha blending and changing the output order of objects. The duration of the GPU operation decreased by about 25%, reading from memory and writing to memory decreased, respectively, by 42.6% and 30.0%. System Analyzer showed that FPS increased by just 1.06. The point here is that Android uses vertical sync mode and the maximum FPS is limited to 60 frames per second. But the frame rate per second, calculated using the Graphics Frame Analyzer, increased by 29.7%.
Indicator
Basic x86 version
Optimized version
GPU cycles
6654210
4965478
GPU Duration
8565.2 ms
6386 µs
Early Z-Drop
16592
3348450
Number of samples recorded
6053311
2813997
Read from memory, GPU
20, 9 MB
12.0 MB
Memory Record, GPU
28.6 MB
20.0 MB
FPS calculated in System Analyzer
59.01
60.07
FPS calculated in Graphics Frame Analyzer
120.9
156.8
Here are the main indicators for the basic and optimized versions of the application, presented in the form of a graph.


Key figures before and after optimization

findings


When you begin to optimize the game for Android x86, you should first port it to this platform, and then find the bottlenecks. Profiling tools can help you measure performance and find performance problems related to the GPU. Intel's powerful analytic GPA tool can enable you to experiment with the graphics component of an application without making changes to the source code.

Source: https://habr.com/ru/post/275927/


All Articles