To access this pane:
Click the Image may be NSFW.
Clik here to view.New Analysis button on the Intel® VTune™ Amplifier toolbar.
The New Amplifier Result tab opens with the Analysis Type window active.
Select the Algorithm Analysis > Concurrency analysis type from the analysis tree on the left pane.
The Concurrency pane opens on the right.
Use this pane to explore and edit the Concurrency analysis type predefined configuration. This analysis type helps find out where your application does not use the available logical CPUs effectively.
Use This | To Do This |
---|---|
CPU sampling interval, ms spin box | Specify an interval between CPU samples. |
Analyze DirectX pipeline events check box | Analyze GPU usage and frame rate based on the data provided by the DirectX* and identify whether your application is GPU or CPU bound. |
Analyze user tasks check box | Analyze tasks specified in your code via Task API. |
Analyze Intel runtimes and user synchronization check box | Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP* and Intel® Threading Building Blocks (Intel TBB) or by the user. This option causes higher overhead and increases result size. |
Trace OpenCL kernels on Processor Graphics check box | Capture the execution time of OpenCL™ kernels on a GPU, identify performance-critical GPU computing tasks, and analyze the performance of OpenCL kernels per GPU hardware metrics. |
Analyze Processor Graphics hardware events drop-down menu | Analyze performance data from Intel HD Graphics based on the predefined groups of GPU metrics. |
Details button | Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify these settings for the Concurrency analysis, click the Copy button in the upper right corner. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis branch in the analysis tree. |
The Details section provides information on the following default collection settings used for the Concurrency analysis:
Use This Option | To Do This | Default Concurrency Value |
---|---|---|
CPU sampling interval, ms | Set the interval between collected CPU samples in milliseconds. | 10 |
Collect highly accurate CPU time | Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required. | Yes |
Collect CPU sampling data | Enable sampling and include stack unwinding, that is respective result windows and panes will contain information about function call stacks. | With stacks |
Collect signalling API data | Identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for signaling calls, that is respective result windows and panes will contain information about calling sequences for signaling calls. | With stacks |
Collect synchronisation API data | Identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for synchronization wait calls, that is respective result windows and panes will contain information about calling sequences for synchronization wait calls. | With stacks |
Collect I/O API data | Identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for I/O calls, that is respective result windows and panes will contain information about calling sequences for I/O calls. | With stacks |
Analyze user tasks | Analyze tasks in your code specified via Task API. This option causes higher overhead and increases result size. | No |
Analyze Intel runtimes and user synchronization | Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP* and Intel® Threading Building Blocks (Intel TBB) or by the user. This option causes higher overhead and increases result size. | No |
Analyze Processor Graphics hardware events | Analyze performance data from Intel HD Graphics based on the predefined groups of GPU metrics. | No |
Analyze DirectX* pipeline events | Analyze GPU usage and frame rate based on the data provided by the DirectX* and identify whether your application is GPU or CPU bound. | No |
Trace OpenCL kernels on Processor Graphics | Capture the execution time of OpenCL kernels on a GPU, identify performance-critical GPU computing tasks, and analyze the performance of OpenCL kernels per GPU hardware metrics. | No |
GPU sampling interval, us | Specify an interval between GPU samples. | 1000 |
Stack unwinding mode | Enable stack unwinding after collection finishes (offline mode). Offline mode reduces analysis overhead and is typically recommended. | After collection |
Stitch stacks | For applications using Intel Threading Building Blocks (Intel TBB) or OpenMP* using Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload. | Yes |
Collect timeline data | Enable collecting and retaining overhead data to display the Timeline pane. This mode increases result size. | Yes |
Collect frequency data | Collect data about processor frequency changes. This type of data collection is supported only for Linux* systems based on Intel® Xeon® processors. | No |
Collect sleep data | Analyze when and what causes the hardware to wake up from a sleep state. This type of data collection is supported only for Linux* systems based on Intel Xeon® processors. | No |
Note
You may copy the command line for this configuration using the Command Line... button at the bottom and run this analysis remotely.