Interpreting GPU OpenCL™ Application Analysis Data

If you identified with the Intel® VTune™ Amplifier that your application is GPU-bound and you know that your application uses OpenCL™ software technology, you may enable the Trace OpenCL kernels on Processor Graphics configuration option for your analysis to identify how effectively your application uses OpenCL kernels. Follow these steps to explore the data provided by the VTune Amplifier for OpenCL application analysis:

Explore Summary Statistics

Start your data analysis with the Summary window that provides application-level performance statistics. Typically, you focus on the primary baseline, which is the Elapsed Time metric that shows the total time your target ran:

You can correlate this data with the GPU Time used by GPU engines while your application was running:

If the GPU Time takes a significant portion of the Elapsed Time, it clearly indicates that the application is GPU-bound. In the example above, the GPU Time spent in the Render and GPGPU engine is very small, which means that most of the time your application was using CPU resources.

For OpenCL applications, the VTune Amplifier provides the list of OpenCL kernels with the highest execution time:

The Summary window also displays platform information including GPU and CPU data:

The last four GPU characteristics are specific to Intel® HD Graphics.

Identify Hot GPU OpenCL™ Kernels

To view information about all OpenCL kernels running on the GPU, in the Graphics window switch Grouping to Computing Task Purpose / Computing Task (GPU) / Instance. VTune Amplifier identifies the following computing task purposes: Compute (kernels), Transfer (OpenCL routines responsible for transferring data from the host to a GPU), and Synchronization (for example, clEnqueueBarrierWithWaitList).

The corresponding columns show the overall time a kernel ran on the GPU and the average time for a single invocation (corresponding to one call of clEnqueueNDRangeKernel ), working group sizes, as well as averaged GPU hardware metrics collected for a kernel. Hover over a metric column header to read the metric description and view the formula used for the metric calculation. If a metric value for a computing task exceeds a threshold set up by Intel architects for the metric, this value is highlighted in pink, which signals a performance issue. Hover over such a value to read the issue description.

Every clCreateKernel results in a line in the Compute category. If two different kernels with the same name (even from the same source) were created with two clCreateKernel calls (and then invoked through two or more clEnqueueNDRangeKernel ), two lines with the same kernel name appear in the table.

Source Computing Task (GPU) grouping is useful to aggregate data per the same kernel source:

Analyze and optimize hot kernels with the longest Total Time values first. These include kernels characterized by long average time values and kernels whose average time values are not long, but they are invoked more frequently than the others. Both groups deserve attention.

To correlate GPU metrics and OpenCL kernels data, explore the Timeline pane:

Explore the Computing Queue

To view details on OpenCL kernels submission, in particular distinguish the order of submission and execution, and analyze the time spent in the queue, zoom in and explore the Computing Queue data in the Timeline pane. You can click a kernel task to highlight the whole queue to the execution displayed at the top layer:

VTune Amplifier displays kernels with the same name and size in the same color. Synchronization tasks are marked with vertical hatching . Data transfers are marked with cross-diagonal hatching .

Parent topic: Interpreting Results

Interpreting GPU OpenCL™ Application Analysis Data

Explore Summary Statistics

Identify Hot GPU OpenCL™ Kernels

Explore the Computing Queue

See Also

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112