
In the continuation of a series of articles devoted to Intel INDE, I want to talk about a unique (I’m not afraid of this word) tool for developers created by Intel -
Intel GPA (
Graphics Performance Analyzers ). I already mentioned it in a review article
on Intel INDE .
Intel GPA includes tools for analyzing performance, finding bottlenecks, and optimizing application performance. At the same time, the tools have an intuitive, user-friendly graphical interface, which gives developers the opportunity to start "right off the bat", even without extensive experience in optimizing and debugging graphical applications.
Intel GPA supports performance analysis of applications for Windows and Android. A Windows version (DirectX) has existed on the market for a long time, many articles have been written about it; the version for Android, intended for applications using
OpenGL , appeared relatively recently, and many developers are not even aware of its existence. I am going to fill this gap.
')
System requirements
Intel GPA is available for the following host operating systems.
- Windows 7, 8, 8.1
- OS X
- Ubuntu
Mobile device
You also need an Android device based on Intel Atom, this is due to the fact that the analysis of the application is performed directly on the device, without rutting or somehow preparing the device, the only requirement is that the device must be defined by
ADB (
Android Debug Bridge ).
application
The profiled application must have the
debuggable = ”true” flag in the manifest
.
<application android:debuggable=”true” … />
And permission
android.permission.INTERNET
<uses-permission android:name="android.permission.INTERNET" />
Installing Intel GPA
There are two ways to install Intel GPA:
As part of Intel INDE
Go to
the Intel INDE website , download and install the package manager (more on this in the review
article on Intel INDE ).
Choose
GPA System Analyzer , click
Download , wait for the download to complete and install.
From the Intel GPA home page
Open the
Intel GPA home page , choose a package for your OS, download, install.
What's inside
GPA Perfomance Analyzers and
GPA Frame Analyzer will be installed along with
GPA System Analyzer . All the tools in the package deserve special attention, however, in order not to overload the article with information about all three, this time I will talk about
GPA System Analyzer .
Beginning of work
First you need to connect the device to the host and make sure that it is displayed in the
ADB list
Command line
adb devices
GPA System Analyzer
This tool allows you to analyze application performance, find potential bottlenecks, and remove metrics for more detailed analysis. Displays various performance characteristics of the mobile platform such as:
- CPU status
- Memory
- power usage
- Network disk activity
- GPU metrics
- OpenGL ES API metrics
- power usage
GPA System Analyzer allows you to "play" with different rendering parameters, without making changes to the code. In this case, the result can be immediately seen on the screen of the mobile device and observed on the graphs (
CPU load,
GPU ,
FPS ), as this affects the performance. In addition, the tool allows you to collect data on the performance of the current frame, which can be analyzed in more detail using the
GPA Performance Analyzers and
GPA Frame Analyzer .
So run
GPA System Analyzer . After launch, a list of available devices should be displayed:
If the device is determined by
ADB , but is not listed, try manually specifying the path to the
ADB . To do this, press
Ctrl + F1 and enter the path to the folder with
ADB .
After connecting to the device, you will see a list of applications installed on the device:
To start and start analyzing the application, simply click on its name in the list.
After running the application on your mobile device, you will see the following screen.
The column on the left displays the various metrics (
Metrics ) and the rendering options of the
State Overrides . On the right side are graphs that display performance metrics (in this case,
CPU loading by the analyzed application and
FPS ).
To add a graph of the selected performance indicator, simply drag the line with its name to the graph area.
You can also combine two or more graphs on one (useful for monitoring related metrics) if you hold down the
Ctrl key while moving the metric to the graph area:
Metrics
CPU
- Aggregated CPU Load - Total CPU Load
- CPU 0x Load - load each core
- CPU 0x Frequency - the current frequency of each core
- Target App CPU Load - CPU load profiled application
Device IO
The metrics listed below take into account read-write operations by all applications on the device, without reference to the application being profiled.
- Disk Read - the speed of data read operations from the drive (internal memory or SD card).
- Disk Write - write speed to the drive.
- Network RX - the speed of receiving data from the network (including all network interfaces except loopback )
- Network TX - data transfer rate
GPU
- GPU Frequency - the frequency of the GPU in the monitoring process
- TA Load - load Tile Accelerator , in percentage
- PB Primitives / Second - the number of graphics primitives loaded in Parameter Buffer
- PB Vertices / Second - the number of vertices loaded in the Parameter Buffer
- PB Vertices / Primitive - the number of vertices per primitive, a summary parameter indicating the effectiveness of the primitive construction process
- ISP Load - time taken by Image Synthesis Processor to cut off invisible surfaces, in percent
- TSP Load - the time that the Texture Shading Processor spends on texturing visible fragments, in percent
- USSE Vertex Load - the time that the Universal Scalable Shader Engine is busy calculating vertices, in percent
- USSE Pixel Load - the time that the USSE spends processing pixels
- USSE Total Load - the time that the USSE spends on executing instructions
- USSE Vertices / Second - the number of vertices processed by the USSE , per second
Memory
- App Resident Memory - the amount of memory ( RAM ) used by the application being analyzed.
- Available Memory - the amount of mobile device memory available to the application.
Opengl
- Buffer Creations - the number of buffers created by the application per frame
- Draw Calls - the number of calls drawing function, per frame
- Indexed Draw Calls - same as Draw Calls , but for indexed primitives
- Vertex Count - the number of vertices involved in scene rendering, per frame
- Indexed Vertex Count - the same as Vertex Count , but for indexed vertices
- FPS - the number of frames per second
- Frame Time - the time of drawing one frame
- Frame Number - the number of the current frame
- RT Clears - how many times glClear for Render Target was called per frame
- Error Gets - the number of glGetError calls per frame
- Total State Changes - the total change in the states of Open GL in one frame
- Bind Texture Calls - the number of glBindTexture calls per frame
- Bind Buffer Calls - the number of glBindBuffer calls per frame
- Use Program Calls - the number of glUseProgram calls per frame
Power
- Battery Voltage - current battery voltage
- Current Charging - shows how much milliamps a device consumes via USB to charge the battery
- Current Discharging - how much milliamps consumed by a mobile device from a battery
- Power Charging - the number of watts consumed by the battery
- Power Discharging - the number of watts consumed by the device
Finding problems and ways to improve performance with metrics
As you can see,
GPA System Analyzer helps to monitor almost any indicators reflecting application performance. But not all metrics can directly indicate problems (as in the case of
CPU loading, the higher the worse), some of them will tell something useful, only when compared with others.
GPU Performance Metrics
TA Load and USSE Vertex Load
Ideally, both indicators should be balanced, which allows for better performance.
TA Load high,
USSE Vertex Load low - the scene contains too many vertices, you can improve performance by simplifying objects.
TA Load low,
USSE Vertex Load high — the vertex shader is too complex, there is room for optimizing the shader code.
PB Primitives / Second
Too high a score indicates that the problem is most likely in the size of the vertex format.
PB Vertices / Second
A high score may indicate a large amount of data transferred between the vertex and fragment shaders.
PB Vertices / Primitive
A high figure indicates the possibility of optimization due to a decrease in the number of vertices in models, for example, through their reuse using an index buffer.
ISP Load
A high score may be in cases where a single
Z-buffer is used with multiple
Render Target . To improve the situation, you can create your own buffer for each
RT .
TSP Load, Texture Unit Load, USSE Pixel Load
High
TSP Load indicates the possibility of improving performance by optimizing shaders (high load
USSE Pixel Load ) or textures (high loading
Texture Unit ) by reducing the resolution, using compression.
USSE Total Load, USSE Vertex Load, USSE Pixel Load
High
USSE Total Load indicates the possibility of improving performance by optimizing the operation of vertex (high
USSE Vertex Load ) or fragmentary (
USSE Pixel Load ) shaders.
OpenGL metrics
Draw Calls & Indexed Draw Calls
In terms of performance, calling drawing functions is a costly operation. High metrics may indicate performance improvements by grouping vertices and drawing with one call.
Buffer Creations
Buffer allocation is a costly operation, ideally should occur somewhere during the stage initialization stage. The appearance of this indicator on the graph suggests that you have the opportunity to improve the efficiency of the code by transferring the code that creates buffers in the initialization, loading the scene.
Error Gets
GlGetError calls degrade performance. In the final version of your application, this indicator should be zero.
State overrides
Another interesting tool is the ability to redefine the state of the analyzed application without making changes to the code. In fact, these are various experiments that you can put on your application in order to understand how the inclusion of certain parameters affects the performance of the application.
Disable All
Disables all active options. Displays the active scene as is.
1x1 Scissor Rect
Disables pixel processing in the graphic pipeline. If the
FPS option does not change, then most likely the problem is in too complex stage geometry or vertex shader.
(in this case, an empty scene will be displayed)
Disable Alpha Blending
Disables
Alpha Blending . Transparency operations can seriously affect performance. This experiment will show you how disabling blending affects
FPS .
Disable Draw Calls
Ignores drawing functions. This experiment will help you understand how an application will behave on a device with an infinitely fast graphics chip.
Disable Z-Test
Z buffer is used to trim objects that are located completely or partially behind other objects on the stage. Enabling this option should “slow down” the drawing of the scene. If this does not happen, then you have the opportunity to improve performance by sorting objects from near to far before drawing them.
Show wireframe
Includes
wireframe display mode, allows you to visually assess the order of the objects, the complexity of the models.
Simple Fragment Shader
Replaces the fragmentary shader with a simple, monochrome. If performance is improved when this option is enabled, try optimizing the fragment shader code.
Texture 2x2
Replaces used textures with simpler ones. If you see an improvement in performance when you turn on this option, there is a way to optimize the application by optimizing textures (reducing resolution, using compression).
This is where the
GPA System Analyzer part ends. I hope the information obtained will help you quickly master this tool and put the knowledge gained into practice in practice.
Next time I will talk about a tool called the
GPA Frame Debugger , which allows you to conduct a detailed analysis of OpenGL scenes in a simple and intuitive form.