Many people like the representation of the program structure in the form of a call graph, the “function call graph”. It is especially interesting if this graph reflects the performance profile, the most “hot” code branches.
The call graph can be obtained using Intel VTune Amplifier XE, but for this we need another pair of utilities.

First, about the basic functionality. Presentation of performance profile data is one of the most powerful things in the VTune Amplifier arsenal. You can explore the call tree on the Caller / Callee tab:
')

On the left is a list of program functions, sorted by CPU time consumed. Time is divided into total and proper. The total time includes all called functions. From this list you can see which functions themselves perform heavy calculations (long self time), and which are on the “hot path” (long total time). In the panel on the left, each function occurs once, i.e. if it is called in different branches of the code, then the total and proper time from them is added up. So you can determine the total contribution of each function.
In our example, 12 functions have the largest total time, roughly the same, and zero self time. This means that they are all on the “hot road”, possibly causing each other. If we want to explore this “hottest way”, click on any function you like and look at the panels on the right.
The top right panel displays a sequence of calls "up" - i.e. all calling functions. For each function of the tree also has its own and total time. The root of the tree (above) is the function selected in the left pane. If you click on the left to another, the tree will be rebuilt. We chose the function render_one_pixel, and we see that almost all the other 11 functions with a large total time stand in one common call chain. On this panel, a tree can branch - if there are several code branches, everything will be shown, with the CPU time distributed to all branches.
The bottom right panel, you guessed it, draws a tree of called functions. Those. if the node you are interested in has a lot of total time and a little self time, it is worth seeing what it causes. In the screenshot above, a significant part, 9.5 of 16 seconds, is spent in the function initialize_2D_buffer, the rest comes from the branch of the trace function.
We draw the visual call graph
The capabilities of the VTune Amplifier and the Caller / Callee view are enough to navigate the call tree and identify performance-critical functions. However, some people like to see the whole tree at once, in one picture. This is how some profilers present these data, as VTune did many years ago.
For fans of the spreading tree of calls to hot functions, there is a way to build it.
1. Get the profile VTune Amplifier
Everything is simple here - we collect any result. One condition - it must have stacks. Those. Advanced hotspots with a level of detail of “Hotspots” will not work, there are no stacks there. A simple analysis of Basic hotspots is quite suitable - you can build it in the GUI or command line:
amplxe-cl -collect hotspots -result-dir r000hs -- find_hotspots balls.dat
2. We print out the result in gprof style.
VTune Amplifier can present data in the gprof profiler format, it will be necessary for further conversions. Here you will definitely need the command line (the same on Windows and Linux):
amplxe-cl -report gprof-cc -result-dir r000hs -format text -report-output r000hs_gprof_cc.txt

3. Convert the result to a graph with Gprof2dot
Now we need the Gprof2dot utility. This is a python script that can build a graph in DOT format from the results of different profilers. Thanks to Mr. Jose Fonseca for his creation and support.
The script not only knows how to build a DOT graph from gprof results, but also supports VTune Amplifier - thanks to the community contributors. The VTune and gprof formats, though similar, but not perfectly matched, had to make patches. But the main thing is that now everything is working. Specify “ax” as the format and printout from step 2 at the input:
python gprof2dot.py -f axe r000hs_gprof_cc.txt
4. Convert DOT graph to image
Here one more tool is useful - Graphviz. He builds a visual image of the graph according to its description in text form:
python gprof2dot.py -f axe r000hs_gprof_cc.txt | "c:\Program Files (x86)\Graphviz2.38\bin\dot" -Tpng -or000hs_call_graph.png
Actually, step 4 includes step 3, just described in more detail who does what.
Voila, now you can see the call tree visually (a fragment of the picture is shown):

Such a graph reflects the structure of function calls (not all, the most computationally loaded), and the distribution of processor time, its own and total. The redder the color, the greater the load on the function. So you can observe the "hot path". The disadvantages are the possible large size of the entire tree and its static nature - in the PNG image you can no longer group, filter, view the source code and performance metrics, as you can in VTune Amplifier. But who likes what.