Quantcast
Channel: C#
Viewing all articles
Browse latest Browse all 1853

Interpreting General Exploration Data

$
0
0

When the General Exploration analysis type is complete, the Intel® VTune™ Amplifier opens the General Exploration viewpoint. This viewpoint displays data per event-based metrics. Depending on your microarchitecture, metrics can be organized by execution categories so that you could easily identify what portion of the pipeline is responsible for the majority of execution time. For example, for Intel microarchitecture code name Sandy Bridge the VTune Amplifier analyzes execution categories based on the this flow:

Identifying Where Cycles Are Spent

The four leaf categories serve as high-level performance metrics in the General Exploration viewpoint.

Each metric is an event ratio defined by Intel architects and has its own predefined threshold. VTune Amplifier analyzes a ratio value for each aggregated program unit (for example, function). When this value exceeds the threshold and the program unit has more then 5% of CPU time from collection CPU time, it signals a potential performance problem and highlights such a value in pink.

Note

For a detailed tuning methodology based on the events and metrics in General Exploration, visit https://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers.

To interpret the performance data provided during the advanced event-based sampling analysis, you may follow the steps below:

  1. Learn metrics and define a performance baseline.

  2. Identify hardware issues.

  3. Analyze source.

  4. Explore other analysis types/viewpoints.

Learn Metrics and Define a Performance Baseline

In the Hardware Issues viewpoint, click the Summary tab to switch to the Summary window. The first section displays the summary statistics on the overall application execution per hardware-related metrics. Metrics are organized by execution categories:

General Exploration Viewpoint: Summary Window

All metric names are hyperlinks. Clicking such a hyperlink opens the Bottom-up window and sorts the data in the grid by the selected metric.

To view the metric description, mouse over the help icon :

In the example above, mousing over the DTLB Overhead metric displays the metric description in the tooltip. The value for this metric is highlighted in pink, which signals a performance issue for the whole application execution. VTune Amplifier describes the detected performance issue under the metric entry. If the description is lengthy, it may be truncated. Hover over the truncated text to see the full description.

You may use the performance issues identified by the VTune Amplifier as a baseline for comparison of versions before and after optimization. Your primary performance indicator is the Elapsed time value.

Identify Hardware Issues

To view hardware issues per a program unit, switch to the Bottom-up pane. Each row represents a program unit and percentage of time used by this unit. Program units that take more than 5% of the CPU time are considered as hotspots. By default, the VTune Amplifier sorts the data in the descending order by Clockticks and provides the hotspots at the top of the list.

Each column in the Bottom-up pane represents a hardware performance metric. VTune Amplifier calculates a metric based on the formula provided by Intel architects. Mouse over the column header to read the metric description and view the formula. By default, metric values are represented as Bars. You can change the representation mode with the Show Data As context menu option.

Each metric has a threshold value. If the metric value exceeds the threshold and the program unit is a hotspot, the VTune Amplifier highlights this value in pink as performance-critical. Mouse over each pink cell to read a description of the issue, recommended solution (if any), and the formula used to calculate a threshold value for the issue.

General Exploration Viewpoint: Bottom-up Window

In the example above, the VTune Amplifier identified the check_matrix function as the biggest hotspot that took the most CPU time. VTune Amplifier detected issues for not issued micro operations. For unfilled pipeline slots, the VTune Amplifier identified issues with Memory Latency, which often stem from latency in the memory hierarchy. For example, to handle issues with DTLB Overhead during the execution of the check_matrix function, about 27% (0.273) of the time were used. This means that if you focus on this function hotspot and optimize it, you can potentially gain up to 27% of performance boost for this function.

VTune Amplifier is able to identify the most common types of pipeline bottlenecks. You may go deeper for more details. If the deeper levels of the metrics do not display any data, it means that issues detected at the upper level cannot be covered by the VTune Amplifier metrics. For example, the VTune Amplifier detected back-end bound issues in the first four functions in the analysis result below:

If you expand the Back-end Bound column, you see that back-end bound issues are not caused by Memory Latency issues analyzed by the VTune Amplifier:

Analyze Source

When you identified a critical function, double-click it to open the Source/Assembly window and analyze the source code. The Source/Assembly window displays event data. Focus on the events included into the hardware metric identified as performance-critical in the Bottom-up pane (see the calculation formula for the metric). You may sort the columns to locate the required event data leftmost or set the required event column as a Data of Interest. VTune Amplifier remembers your settings and restores them each time you open your result.

Explore Other Analysis Types/Viewpoints

  • The General Exploration analysis for Intel microarchitecture code name Sandy Bridge collects information about many parts of the core pipeline and memory system. You may use the collected data to extend the analysis and focus on particular events. Thus, for example, if during the General Exploration analysis you see Branch Mispredict issues, you may run the Branch analysis type to get more detailed information on the branch issues. You may also create your own analysis configuration and monitor events you are interested in.

  • You may view the collected data using the Hotspots viewpoint or run the Basic Hotspots analysis type. Analyzing the source and assembly code for the hotspot function in the Hotspots viewpoint helps identify which instruction contributes most to the poor performance and how much CPU time the hotspot source line takes. Such a code analysis could be useful for the hotspots that do not show any issues in the sub-metrics but do show problems at the upper level of metrics (see the example above).

  • Run the comparison analysis to understand the performance gain you obtain after your optimization.

Note

For information on processor events, see the Intel Processor Event Reference available from the Help menu.

Inglese

Viewing all articles
Browse latest Browse all 1853

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>