Once the results are collected, you can open them in the graphical or command line interface of the Intel® VTune™ Amplifier.
To view the results in the command line interface:
Use the -report
option. To get the list of all available VTune Amplifier reports, enter amplxe-cl-help report.
To view the results in the graphical interface:
Launch the amplxe-gui <result path> command or launch the amplxe-gui tool.
Click the
menu button, select Open> Result... and browse to the required result file (
*.amplxe
).
Tip
You may copy a result to another system and view it there (for example, to open a result collected on a Linux* cluster on a Windows* workstation).
VTune Amplifier classifies MPI functions as system functions similar to Intel Threading Building Blocks (Intel TBB) and OpenMP* functions. This approach helps you focus on your code rather than MPI internals. You can use the VTune Amplifier GUI Call Stack Mode filter bar combo box and CLI call-stack-mode option to enable displaying the system functions and thus view and analyze the internals of the MPI implementation. The call stack mode User functions+1 is especially useful to find the MPI functions that consumed most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, in the call chain main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ..., MPI_Bar() is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:
The default Only user functions call stack mode attributes the time spent in the MPI calls to the user function foo() so that you can see which of your functions you can change to actually improve the performance.
The User functions+1 mode attributes the time spent in the MPI implementation to the top-level system function - MPI_Bar() so that you can easily see outstandingly heavy MPI calls.
The User/system functions mode shows the call tree without any re-attribution so that you can see where exactly in the MPI library the time was spent.
VTune Amplifier provides Intel TBB and OpenMP support. You are recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the VTune Amplifier to analyze the performance of that level of parallelism. The MPI, OpenMP, and Intel TBB features in the VTune Amplifier are functionally independent, so all usual features of OpenMP and Intel TBB support are applicable when looking into a result collected for an MPI process.
Example
This example displays the performance report for functions and modules analyzed for Hotspots. Note that this example opens individual analysis results each of which was collected for a specific rank of MPI process (foo.14
and foo.15
):
$ amplxe-cl -R hotspots -q -format text -r foo.14 Function Module CPU Time -------- ------ -------- f a.out 6.070 main a.out 2.990 $ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14 Module CPU Time ------ -------- a.out 9.060
See Also
Supplemental documentation specific to a particular Intel Studio may be available at <install-dir>\<studio>\documentation\
.