Use the Intel® VTune™ Amplifier to profile applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations. VTune Amplifier can monitor, analyze, and correlate activities on both the CPU and GPU.
Track the GPU activity to:
Identify code regions where your application is CPU or GPU bound
Estimate how effectively the Intel HD Graphics is used
Analyze hot OpenCL™ kernels and Intel® Media SDK tasks running on the GPU
Depending on your target platform, the VTune Amplifier provides GPU analysis options as follows:
Target Platform | GPU | GPU Usage | GPU HW Metrics | OpenCL™ Kernel Analysis | Intel Media SDK Program Analysis |
---|---|---|---|---|---|
Windows | Intel HD Graphics | yes | yes | yes | no |
non-Intel HD Graphics | yes | no | no | no | |
Android | Intel HD Graphics | yes | yes | no | no |
non-Intel HD Graphics | no | no | no | no | |
Linux | Intel HD Graphics | yes | no | yes | yes |
non-Intel HD Graphics | no | no | no | no |
Note
If you run GPU analysis via a Remote Desktop connection, make sure your software fits these requirements:
VTune Amplifier 2015 Update 2, or higher
Intel® HD Graphics driver version 15.36.14.64.4080, or higher
target analysis application runnable via RDC
Otherwise, run the VTune Amplifier from the target computer's console or access the computer via VNC.
To enable the GPU analysis, you have to use the CPU/GPU Concurrency analysis or configure your predefined or custom configuration to analyze GPU usage and Processor Graphics hardware events and, optionally, trace Intel Media SDK programs and OpenCL kernels execution on a GPU (if your application uses corresponding APIs on a GPU). VTune Amplifier starts the analysis and provides collected GPU performance data in all available viewpoints.
Note
To monitor general GPU usage over time, run the VTune Amplifier as an Administrator.
Configure the VTune Amplifier to explore GPU busyness over time and understand whether your application is CPU or GPU bound.
Theoretically, if the Timeline pane: Metrics Over Time tab in the Graphics window shows that the GPU is busy most of the time and having small idle gaps between busy intervals and the GPU software queue is rarely decreased to zero, your application is GPU bound. If the gaps between busy intervals are big and the CPU is busy during these gaps, your application is CPU bound. But such obvious situations are rare and you need a detailed analysis to understand all dependencies. For example, an application may be mistakenly considered GPU bound when GPU engines usage is serialized (for example, when GPU engines responsible for video processing and for rendering are loaded in turns). In this case, an ineffective scheduling on the GPU results from the application code running on the CPU.
When the GPU is intensely busy over time, you may look deeper and understand whether it is used effectively and whether there is some room for improvement. Such an analysis is possible with the hardware metrics collected by the VTune Amplifier for the Render and GPGPU engine of the Intel HD graphics.
Intel HD Graphics Render Engine and Hardware Metrics
A GPU is a highly parallel machine where graphical or computational work is done by an array of small cores, or execution units (EUs). Each EU simultaneously runs several lightweight threads. When one of these threads is picked up for an execution, it can hide stalls in the other threads if the other threads are stalled waiting for data from memory or other units.
To use the full potential of the GPU, applications should enable the scheduling of as many threads as possible and minimize idle cycles. Minimizing stalls is also very important for graphics and general purpose computing GPU applications.
VTune Amplifier provides an option to monitor Intel GPU hardware events and display metrics about integral GPU resource usage over a sampled period, for example, ratio of cycles when EUs were idle, stalled, or active as well as statistics on memory accesses and other functional units. If the VTune Amplifier traces GPU OpenCL kernels execution, it annotates each kernel with GPU metrics.
The scheme below displays metrics collected by the VTune Amplifier across different parts of the Intel HD Graphics:
GPU metrics help identify how efficiently GPU hardware resources are used and whether any performance improvements are possible. Many metrics are represented as a ratio of cycles when the GPU functional unit(s) is in a specific state over all the cycles available for a sampling period. To see a formula used for a metric calculation, hover over a corresponding column name in the grid. For example, the VTune Amplifier collects data for the following basic GPU hardware metrics: