Unity is one of the most popular gaming engines for mobile platforms (Android and iOS). Many developers use it to create and release games. Before Unity supported Android devices built on the Intel platform, games had to be played in an emulator, which replaced the low-level ARM code with x86 code. As a result, some games that were not originally designed for the x86 platform did not start in the emulator, or had performance problems. Today, with the growing presence of Intel processors in a mobile environment, many developers are interested in supporting Android devices based on the x86 architecture. Developers want to know how to optimize games for such equipment.
In this article, we show how you can achieve performance by developing Android applications designed for the x86 platform, and we will share tips on optimizing games using the example of Hero Sky: Epic Guild Wars.
Game Hero Sky: Epic Guild WarsThe company
Innospark , developer
Hero Sky: Epic Guild Wars , has extensive experience in creating mobile games using various commercial game engines. In addition, Innospark has its own engine. Hero Sky: Epic Guild Wars is the first game of the company, developed using Unity and released on the world market. After being published on Google Play, with the growing number of downloads, the company began to face user complaints. On some Android devices based on the Intel platform, the game simply did not work, on others its performance left much to be desired. As a result, the company decided to port the game to the x86 platform and optimize it. Here we describe how Hero Sky: Epic Guild Wars was optimized using application profiling results using
Intel Graphics Performance Analyzers (Intel GPA). In particular, let's talk about the impact on performance of the order of output of objects and alpha blending.
')
Preliminary Information
Hero Sky: Epic Guild Wars is an online military 3D strategy game. Innospark developed and optimized the game using a system based on the Intel Atom (Bay Trail). Here are the features of the reference device used in the tests. Here are the system characteristics and 3DMark test results. The device has an 8-inch screen.
Indicator
| Characteristic
|
CPU
| Intel Atom processor, Quad Core 1.46 Ghz
|
OS
| Android 4.4.4
|
Ram
| 2 GB
|
Screen resolution
| 1920x1200
|
3DMarkICE Storm Unlimited Test Result
| 10386
|
Graphics
| 9274
|
Physics
| 17899
|
Here is a graph comparing the performance of native and emulated code on a device.
Performance gains that can be achieved with x86 supportEvaluating the test results, it is worth remembering that they can be optimized in the calculation, for example, only on microprocessors from Intel. Performance tests such as SYSMark and MobileMark are run on specific systems that have specific hardware components and software installed. During their course, certain sets of actions are performed. Any change can affect the test results. This concerns the software and hardware components of the device, as well as the test application itself and the test suite. Therefore, taking any decisions on the basis of tests, for example, about buying a device, try to find as many sources of information as possible, including what results are shown by the equipment you are interested in, working in conjunction with other equipment. Learn more about performance
here .
After the game was ported for the x86 architecture, the processor load decreased by about 7.1%, FPS increased by 27.8%, and execution time decreased by about 32.6%. However, the load on the video core increased by 26.7% due to the increase in the frame rate.
At Innospark, Intel GPA was used to find bottlenecks in the performance of the CPU and video chip during the development process. Application analysis data was used to solve graphics problems and improve game performance.
At the beginning of the optimization, using the Intel GPA System Analyzer, the value taken as 51.09 FPS was taken as the baseline. The Graphics Frame Analyzer, which measures FPS only on the GPU side, yielded a value of 120.9 FPS. The reason why these values differ is in the fact that System Analyzer monitors the process behavior in real time, which includes the work of the CPU and the work of the GPU. The Graphics Frame Analyzer tracks only the operation of the video core and the processor activity associated with sending data to the driver and the GPU.
Detailed application analysis using Graphics Frame Analyzer
Screen copy of the original version of the applicationImmediately after transferring the game code to the x86 platform, it showed 59.01 FPS. The game was analyzed in detail using the Graphics Frame Analyzer in order to reduce the load on the video core (GPU Busy) and the processor (CPU Load). The table shows the information obtained using the Graphics Frame Analyzer.
Indicator
| Value
|
Total number of primitives
| 4376
|
GPU running time
| 8.58 ms
|
The time required to display the frame
| 9.35 ms
|
This is how much load on the system was created by the original, non-optimized version of the game. Here are information about drawing calls that require the most system resources.
Type of
| Erg's number
| GPU Duration
| Read from memory, GPU
| Memory Record, GPU
|
Sky
| one
| 1.43 ms
| 0.2 MB
| 7.6 MB
|
Land
| five
| 1.89 ms
| 9.4 MB
| 8.2 MB
|
Analysis and optimization aimed at reducing the load on the system
Eliminate unnecessary alpha blending
When
alpha blending is used when displaying objects, the program must combine, in real time, the color values of all the overlapping objects and the background in order to figure out what the final color will be. Thus, outputting the colors resulting from alpha blending can put a greater load on the processor than the display of opaque colors. These extra computations can hurt performance on slow devices. Therefore, it was decided to get rid of unnecessary alpha blending.
Graphic Frame Analyzer can customize drawing commands. This gives the developer the opportunity to test the program and find out the changes in performance caused by the changes made without modifying the code. This function can be found on the Blend State tab, which is in the State group.
Here's how to enable and disable alpha blending in the Graphics Frame Analyzer without modifying the application source code.The table shows more detailed information about drawing grass after turning off alpha blending. The duration of the GPU, as a result, decreased by 26.0%. In addition, note that the reading from memory decreased by 97.2%.
Indicator
| Basic version
| Disable unnecessary alpha blending (ground)
|
GPU cycles
| 1466843
| 1085794.5
|
GPU Duration
| 1896.6 ms
| 1398.4 µs
|
Read from memory, GPU
| 9.4 MB
| 0.2 MB
|
Memory Record, GPU
| 8.2 MB
| 8.2 MB
|
Effective Z-clipping application
When a 3D video card displays objects, 3D shapes from a three-dimensional space (x, y, z) are transformed into two-dimensional (their position is determined by x and y coordinates). In this case, the Z-buffer, or depth buffer, is used to store the depth information (about the z coordinate) of each screen pixel. If two scene objects are to be displayed on the same pixel, the GPU compares the depth information and overlaps the color of the current pixel with a new one if the new object is closer to the observer. The Z-clipping process faithfully reproduces the familiar perception of the depth of space, bringing closer objects first. They hide objects further down. Z-clipping allows you to improve performance when displaying hidden surfaces.
The game has two types of output environment: sky (erg number 1) and grass (erg number 5). Since most of the sky is behind the grass, a large area of the sky will never be shown during the game. However, the sky is displayed first, which prevents the effective use of Z-clipping.
Challenges for drawing the sky (erg number 1) and grass (erg number 5)Here is how the duration of the GPU works after changing the order of output of objects
Comparison of the load on the system before and after changing the order of output of objects in the Graphics Frame Analyzer.The table shows more detailed information about drawing the sky after changing the order of output of objects. The running time of the GPU, in particular, has decreased by 88%. Pay attention to the fact that the amount of data stored in memory has decreased by about 98.9%.
Indicator
| Basic version
| Change drawing order (sky)
|
GPU cycles
| 1113276
| 133975
|
GPU Duration
| 1443 µs
| 174.2 ms
|
Early Z-Drop
| 0
| 2145344
|
Number of samples recorded
| 2165760
| 20416
|
Read from memory, GPU
| 0.2 MB
| 0.0 MB
|
Memory Record, GPU
| 8.2 MB
| 0.1 MB
|
results
The table shows more detailed results of optimizing the game for the x86 platform after eliminating unnecessary alpha blending and changing the output order of objects. The duration of the GPU operation decreased by about 25%, reading from memory and writing to memory decreased, respectively, by 42.6% and 30.0%. System Analyzer showed that FPS increased by just 1.06. The point here is that Android uses vertical sync mode and the maximum FPS is limited to 60 frames per second. But the frame rate per second, calculated using the Graphics Frame Analyzer, increased by 29.7%.
Indicator
| Basic x86 version
| Optimized version
|
GPU cycles
| 6654210
| 4965478
|
GPU Duration
| 8565.2 ms
| 6386 µs
|
Early Z-Drop
| 16592
| 3348450
|
Number of samples recorded
| 6053311
| 2813997
|
Read from memory, GPU
| 20, 9 MB
| 12.0 MB
|
Memory Record, GPU
| 28.6 MB
| 20.0 MB
|
FPS calculated in System Analyzer
| 59.01
| 60.07
|
FPS calculated in Graphics Frame Analyzer
| 120.9
| 156.8
|
Here are the main indicators for the basic and optimized versions of the application, presented in the form of a graph.
Key figures before and after optimizationfindings
When you begin to optimize the game for Android x86, you should first port it to this platform, and then find the bottlenecks. Profiling tools can help you measure performance and find performance problems related to the GPU. Intel's powerful analytic GPA tool can enable you to experiment with the graphics component of an application without making changes to the source code.