⬆️ ⬇️

7 Intel® VTune Amplifier XE New Features





VTune Amplifier XE has long been known to users for its in-depth analysis of software performance, both at the application level and at the micro-architecture level.



The tool does not stand still and is actively developing, improving and acquiring new functionality. This post provides a brief overview of the new features that appeared in the VTune Amplifier XE 2013, which was released in September, and more recently in subsequent updates:

')





Cycle analysis



As you know, the optimization of high-performance computing is often built around cycles - here is the parallelization and redistribution of data for optimal use of the cache, and vectorization. VTune Amplifier XE 2013 update 3 can determine which “hot spots” are actually “hot loops”. Now the user can concentrate on optimizing them, and not on finding loops from the source code, plus assess the effect of optimizing each particular cycle from launch to launch.







Text search



The results of VTune Amplifier XE profiling can be quite voluminous, and it can be difficult to immediately find the function / module / object of synchronization that you are interested in, etc.

Now users can finally use a simple text search in all main windows: bottom-up, top-down, source view, assembly view.







Energy efficiency analysis



Energy efficiency applications is becoming increasingly important. Indeed, not only the hardware, but also the software is responsible for the waste of energy. VTune Amplifier XE 2013 introduced two new types of analysis in this area (currently available only for Linux).



Analysis of the CPU Frequency allows you to track the change in the clock frequency during the execution of the program on all cores. This gives an estimate of the active power consumption in order to play later with frequency changing technologies: Turbo Boost, SpeedStep, etc.







The CPU Sleep States analysis provides estimates of passive power consumption — transition by C-states. “Wake-ups” - “awakenings” are tracked here. Moving to a deeper C-state and getting out of it have costs, so if this happens too often, it makes sense to think about changing the situation. The CPU Sleep States analysis shows the frequency of state transitions, statistics of being in different states, and, most importantly, objects causing unwanted “wake-ups” —for example, timers:







EBS analysis with stacks



VTune Amplifier XE uses two basic approaches to profiling. The first is based on the binary instrumentation of the process being analyzed and is called “user level analysis”. The second approach (EBS analysis) does not work with the process, but with the PMU module in the processor, which allows profiling not only user applications, but also the operating system and drivers. In addition, it is possible to find "micro-architectural" problems of software.



Before the release of the 2013 version of EBS, analysis allowed us to find functions and their code, but not call stacks. It was inconvenient if, for example, a system function was a hot spot — finding the user code responsible for this was not easy. Since 2013, EBS-based analyzes also provide call stacks. Now you can trace the path from system calls and drivers to your application, if it is involved.







Another nice thing is the statistical calculation of the number of function calls. Now you can not only see the total time spent on the execution of the function, but also evaluate how it is formed - often, but little by little, or vice versa.







Java application profiling



Starting with the 2013 version, VTune Amplifier XE supports profiling Java applications. This can be especially useful for analyzing the “mixed” code that combines Java and calls to “native” modules (for example, to perform heavy calculations). In addition, the profiler allows you to detect micro-architectural problems in Java code, for example, inefficient use of the cache. Read more in this article (in English).







API for custom tasks



Many parallel applications are built on “tasks” - small logical elements of work that are executed by threads. On tasks built, for example, the library Intel Threading Building Blocks.



The API provided by VTune Amplifier XE (__itt API) has recently been supplemented with a markup tool for such tasks:



void do_foo(double seconds); DWORD WINAPI work (void *pArg) { __itt_task_begin (domain, __itt_null, __itt_null, UserSubTask); do_foo (1); __itt_task_end (domain); return 0; } int main() { int i = 0; HANDLE hThread [NUM_THREADS]; __itt_task_begin (domain, __itt_null, __itt_null, UserTask); do_foo (0.3); for (i = 0; i < NUM_THREADS; i++) { hThread[i] = CreateThread ( NULL, 0, work, (void*)i, 0, 0 ); } WaitForMultipleObjects (NUM_THREADS, hThread, TRUE, INFINITE ); __itt_task_end (domain); return 0; } 


Now custom tasks can be tracked through grouping “Task Type / Function / Call Stack" and the Tasks window:











Read more about the task API in this post .



Command interface improvements



VTune Amplifier XE is not only a valuable fur beautiful GUI, but also an advanced command interface. If you view the profiling results on the command line (or this output is used in automated testing), you may have noticed that sometimes not everything fits into the terminal window and the formatting “moves down”. In the 2013 version, the report-width parameter appeared, limiting the width of printed results to fit everything:



 $ amplxe-cl -R hotspots -report-width=100 -r r004cc 


For GNU gprof lovers, a new format for presenting results has appeared, in gprof format:



 $ amplxe-cl -report gprof-cc -r r001hs 


When creating your own EBS analysis with the set of events you need, you may need to run it on another machine. If it does not have a graphical interface, you can copy the command line. Until recently, along with such non-standard analysis, I had to copy the configuration file, which required additional actions, understanding how to use this file, and so on.



In the latest updates of VTune Amplifier XE, it has become easier to do - all parameters are contained on the command line. Just click “Command line” in the GUI and copy the generated string to the remote machine. No extra files needed.







Summary



All seven new features described in the post are available in the latest version of Intel VTune Amplifier XE 2013 update 3 - download, try, enjoy. The trial version can be downloaded from the official site .

Source: https://habr.com/ru/post/162521/



All Articles