Quantcast
Channel: C#
Viewing all articles
Browse latest Browse all 1853

GPU Analysis

$
0
0

Use the Intel® VTune™ Amplifier to profile applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations. VTune Amplifier can monitor, analyze, and correlate activities on both the CPU and GPU.

Track the GPU activity to:

  • Identify code regions where your application is CPU or GPU bound

  • Estimate how effectively the Intel HD Graphics is used

  • Analyze hot OpenCL™ kernels and Intel® Media SDK tasks running on the GPU

Depending on your target platform, the VTune Amplifier provides GPU analysis options as follows:

Target Platform

GPU

GPU Usage

GPU HW Metrics

OpenCL™ Kernel Analysis

Intel Media SDK Program Analysis

Windows

Intel HD Graphics

yes

yes

yes

no

non-Intel HD Graphics

yes

no

no

no

Android

Intel HD Graphics

yes

yes

no

no

non-Intel HD Graphics

no

no

no

no

Linux

Intel HD Graphics

yes

no

yes

yes

non-Intel HD Graphics

no

no

no

no

Note

If you run GPU analysis via a Remote Desktop connection, make sure your software fits these requirements:

  • VTune Amplifier 2015 Update 2, or higher

  • Intel® HD Graphics driver version 15.36.14.64.4080, or higher

  • target analysis application runnable via RDC

Otherwise, run the VTune Amplifier from the target computer's console or access the computer via VNC.

To enable the GPU analysis, you have to use the CPU/GPU Concurrency analysis or configure your predefined or custom configuration to analyze GPU usage and Processor Graphics hardware events and, optionally, trace Intel Media SDK programs and OpenCL kernels execution on a GPU (if your application uses corresponding APIs on a GPU). VTune Amplifier starts the analysis and provides collected GPU performance data in all available viewpoints.

Use This

To Analyze This

Platform window

  • GPU usage by GPU engines over time

  • CPU time usage

  • GPU performance over time by Overview metrics

  • GPU OpenCL kernels and Intel Media SDK program execution on Processor graphics

Graphics window

  • GPU usage by GPU engines over time per thread

  • CPU time usage per thread

  • GPU performance over time by metrics selected from the Analyze Processor Graphics events drop-down menu in the Choose Analysis Type window: Overview, Compute Basic (global/local memory accesses), or Compute Extended (for Intel® Core™ M processor). CPU/GPU Concurrency analysis collects and displays these metrics by default.

  • GPU OpenCL kernels and Intel Media SDK program execution on Processor graphics

    • For OpenCL application analysis, use the Architecture Diagram provided in the Timeline pane to better understand GPU hardware metrics distribution per hardware blocks.

Bottom-up window/PMU Events window

  • GPU usage by GPU engines over time

  • GPU Computing Tasks corresponding to OpenCL kernels submitted and executed on Processor Graphics

GPU and CPU Usage Correlation

Note

To monitor general GPU usage over time, run the VTune Amplifier as an Administrator.

Configure the VTune Amplifier to explore GPU busyness over time and understand whether your application is CPU or GPU bound.

Theoretically, if the Timeline pane: Metrics Over Time tab in the Graphics window shows that the GPU is busy most of the time and having small idle gaps between busy intervals and the GPU software queue is rarely decreased to zero, your application is GPU bound. If the gaps between busy intervals are big and the CPU is busy during these gaps, your application is CPU bound. But such obvious situations are rare and you need a detailed analysis to understand all dependencies. For example, an application may be mistakenly considered GPU bound when GPU engines usage is serialized (for example, when GPU engines responsible for video processing and for rendering are loaded in turns). In this case, an ineffective scheduling on the GPU results from the application code running on the CPU.

When the GPU is intensely busy over time, you may look deeper and understand whether it is used effectively and whether there is some room for improvement. Such an analysis is possible with the hardware metrics collected by the VTune Amplifier for the Render and GPGPU engine of the Intel HD graphics.

Intel HD Graphics Render Engine and Hardware Metrics

A GPU is a highly parallel machine where graphical or computational work is done by an array of small cores, or execution units (EUs). Each EU simultaneously runs several lightweight threads. When one of these threads is picked up for an execution, it can hide stalls in the other threads if the other threads are stalled waiting for data from memory or other units.

To use the full potential of the GPU, applications should enable the scheduling of as many threads as possible and minimize idle cycles. Minimizing stalls is also very important for graphics and general purpose computing GPU applications.

VTune Amplifier provides an option to monitor Intel GPU hardware events and display metrics about integral GPU resource usage over a sampled period, for example, ratio of cycles when EUs were idle, stalled, or active as well as statistics on memory accesses and other functional units. If the VTune Amplifier traces GPU OpenCL kernels execution, it annotates each kernel with GPU metrics.

The scheme below displays metrics collected by the VTune Amplifier across different parts of the Intel HD Graphics:

GPU metrics help identify how efficiently GPU hardware resources are used and whether any performance improvements are possible. Many metrics are represented as a ratio of cycles when the GPU functional unit(s) is in a specific state over all the cycles available for a sampling period. To see a formula used for a metric calculation, hover over a corresponding column name in the grid. For example, the VTune Amplifier collects data for the following basic GPU hardware metrics:

Metric

Formula

EU Array Active

EU Array Stalled

EU Array Idle

Inglese

Viewing all articles
Browse latest Browse all 1853

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>