Intel® VTune™ Amplifier introduces the following basic categories of performance analysis:
Algorithm Analysis
Microarchitecture and Platform Analysis
Each category represents a branch in the performance analysis tree. You may choose to run any of the available analysis types either from the product graphical interface or using command line interface (amplxe-cl
).
Depending on your analysis target, you may also configure the selected performance analysis category for:
All analysis types in the VTune Amplifier are based on one of the following data collection types:
Hardware event-based sampling collection, optionally extended with the stack collection
Each analysis type provides a set of performance metrics that helps you sort out the problems in your code and understand how to optimize it.
VTune Amplifier supports local and remote collection modes via GUI and command line. With VTune Amplifier XE, you can collect data on remote Linux* systems. With VTune Amplifier for Systems, you can collect data on remote Android* and Linux platforms.
All analysis types provided by the VTune Amplifier in the Analysis Type window are predefined by Intel architects and contain a default set of options. You may create custom analysis types based on the available collection types or based on the existing predefined analysis configurations.
Algorithm Analysis
The algorithm analysis branch introduces analysis types targeted for software tuning. You run the analysis and use the collected data to understand the inefficiencies in your current algorithms, and improve application performance. Algorithm analysis includes the following analysis types:
Advanced Hotspots: Event-based sampling analysis that monitors all the software executing on your system including the operating system modules. The collector interrupts the processor at the specified sampling interval and collects samples of instruction addresses.
Basic Hotspots: Performance analysis based on user-mode sampling and tracing collection. It focuses on a particular target, identifies functions that took the most CPU time to execute, restores the call tree for each function, and shows thread activity.
Concurrency: Performance analysis based on user-mode sampling and tracing collection. It focuses on a particular target, identifies functions that took the most CPU time to execute, and shows how well your application is threaded for the existing number of logical CPUs. In Intel System Studio, this analysis type does not support remote data collection on Android systems.
Locks and Waits: Performance analysis based on user-mode sampling and tracing collection that helps identify the synchronization objects that might cause ineffective CPU usage. In Intel System Studio, this analysis type does not support remote data collection on Android systems.
Microarchitecture and Platform Analysis
The Microarchitecture Analysis branch introduces a set of CPU Specific Analysis types based on the event-based sampling data collection. You can use only hardware-level analysis types targeted for your system. For a list of CPUs based on a particular microarchitecture, go to Help > Intel Processor Event Reference menu in the standalone interface or Help > Intel VTune Amplifier version> Intel Processor Event Reference in the Microsoft Visual Studio* IDE.
Analysis types shared between all supported CPUs:
General Exploration: Event-based analysis that helps identify the most significant hardware issues affecting the performance of your application. Consider this analysis type as a starting point when you do hardware-level analysis.
Bandwidth: Event-based analysis that measures the data read and written to DRAM via the processor's integrated memory controller and helps determine whether the code is saturating available bandwidth.
Intel Core 2 Processor Analysis:
Bandwidth Breakdown: Event-based analysis that helps understand the contribution of different components of bus transactions (simple reads, reads for ownership, and writes-back) to the code performance.
Cycles and uOps: Event-based analysis that helps identify several typical performance issues in the core pipeline.
Memory Access: Event-based analysis that helps identify memory access issues and their performance impact from accessing too many memory pages, having L1 data cache and L2 cache load driven misses .
Nehalem / Westmere Analysis:
Cycles and uOps: Event-based analysis that helps identify performance issues in the core pipeline and understand the application execution flow.
Front End Investigation: Event-based analysis that helps identify instruction-delivery-related performance issues.
Memory Access: Event-based analysis that helps understand a memory access pattern and get a direction to its optimization.
Read Bandwidth: Event-based analysis that measures the data read from DRAM via the processor's integrated memory controller and helps determine whether the code is saturating available bandwidth.
Write Bandwidth: Event-based analysis that measures the data written to DRAM via the processor's integrated memory controller and helps determine whether the code is saturating available bandwidth.
Sandy Bridge Analysis:
Access Contention: Event-based sampling analysis that helps you understand where shared memory cacheline contention, lock contention, true and false sharing issues affect the performance of your application.
Branch Analysis: Event-based sampling analysis that helps identify branching issues that may lead to wasted work, increasing application runtime and power consumption.
Client Analysis: Event-based sampling analysis that helps identify hardware issues affecting client applications.
Core Port Saturation: Event-based sampling analysis that helps understand how port saturation affects the performance of your application at a per-core granularity. Look for port/cycle counts above 0.7.
Cycles and uOps: Event-based sampling analysis that helps identify performance issues in the core pipeline and understand the application execution flow.
Memory Access: Event-based sampling analysis that uses precise events and helps understand where memory access issues affect the performance of your application.
Port Saturation: Event-based sampling analysis that helps analyze how port saturation affects the performance of your application at a per-thread granularity. Look for port/cycle counts above 0.7.
Haswell Analysis:
TSX Exploration: Event-based sampling analysis that helps analyze Intel Transactional Synchronization Extension usage and causes of transactional aborts.
Intel Xeon Phi Coprocessor (code name: Knights Corner) Analysis
If you selected the Intel Xeon Phi coprocessor (native) or Intel Xeon Phi coprocessor (host launch)target system type when configuring your analysis target, the VTune Amplifier XE displays a list of analysis types available on the Intel Xeon Phi coprocessor:
Algorithm Analysis> Hotspots: Event-based sampling analysis that monitors all the software executing on your system including the operating system modules. The collector interrupts the processor at the specified sampling interval and collects samples of instruction addresses. Consider this analysis type as a starting point when you do hardware-level analysis on the Intel Xeon Phi coprocessor (code name: Knights Corner).
Microarchitecture Analysis> General Exploration: Event-based analysis that helps identify where microarchitectural issues affect the performance of your application. In addition to the default set of events, this analysis type can collect a set of metrics to analyze cache usage and vectorization intensity.
Microarchitecture Analysis> Bandwidth: Event-based analysis that helps understand where bandwidth issues affect the performance of your application.