Quantcast
Channel: C#
Viewing all articles
Browse latest Browse all 1853

Vectorization Advisor-Windows* OS

$
0
0

Intel® Advisor 2016 offers a vectorization analysis tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ native/managed take full performance advantage of today’s processors. This topic discusses getting started with the Vectorization Advisor GUI on a Windows* platform.

Chosen Platform and Tool/Feature

Choose Vectorization Advisor-Linux* OS  You chose Vectorization Advisor-Windows* OS  Choose Threading Advisor-Linux* OS  Choose Threading Advisor-Windows* OS  Choose CLI & MPI  Choose Documentation & Resources

Key Features

Key Vectorization Advisor features include the following:

  • Survey Report - Offers integrated compiler report data and performance data all in one place. Use the Survey Report to help identify:

    • Where vectorization will pay off the most

    • If vectorized loops are providing benefit, and if not, why not

    • Un-vectorized and under-vectorized loops, and the estimated expected performance gain of vectorization or better vectorization

    • How data accessed by vectorized loops is organized and the estimated expected performance gain of reorganization

    The Survey Report also provides:

    • Friendly, code-specific recommendations for how to fix vectorization issues

    • Quick visibility into source code and assembly code

  • Trip Counts analysis - Dynamically identifies the number of times loops are invoked and execute (sometimes called call count/loop count and iteration count respectively). Use this added information in the Survey Report to make better decisions about your vectorization strategy for particular loops, as well as optimize already-parallel loops.

  • Dependencies Report - For safety purposes, the compiler is often conservative when assuming data dependencies. Use a Dependencies-focused Refinement Report to check for real data dependencies in loops the compiler did not vectorize because of assumed dependencies. If real dependencies are detected, the analysis can provide additional details to help resolve the dependencies. Your objective: Identify and better characterize real data dependencies that could make forced vectorization unsafe.

  • Memory Access Patterns (MAP) Report - Use a MAP-focused Refinement Report to check for various memory issues, such as non-contiguous memory accesses and unit stride vs. non-unit stride accesses. Your objective: Eliminate issues that could lead to significant vector code execution slowdown or block automatic vectorization by the compiler.

Prerequisites

To build applications that produce the most accurate and complete Vectorization Advisor analysis results, build an optimized binary of your application in release mode using these settings:

To Do This

Optimal C/C++ Settings

Request full debug information (compiler and linker).

Command line:

  • /ZI

  • /DEBUG

Microsoft Visual Studio* IDE:

  • C/C++> General> Debug Information Format> Program Database (/Zi)

  • Linker> Debugging> Generate Debug Info> Yes (/DEBUG)

Request moderate optimization.

Command line: /O2 or higher

Visual Studio* IDE: C/C++> Optimization> Optimization> Maximize Speed (/O2) or higher

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Command line: /Qopt-report:5

Visual Studio* IDE: C/C++> Diagnostics [Intel C++]> Optimization Diagnostic Level> Level 5 (/Qopt-report:5)

Enable vectorization

Command line: /Qvec

Enable SIMD directives

Command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Command line: /Qopenmp

Visual Studio* IDE: C/C++> Language [Intel C++]> OpenMP Support> Generate Parallel Code (/Qopenmp)

To Do This

Optimal Fortran Settings

Request full debug information (compiler and linker).

Command line:

  • /debug=full

  • /DEBUG

Visual Studio* IDE:

  • Fortran> General> Debug Information Format> Full (/debug=full)

  • Linker> Debugging> Generate Debug Info> Yes (/DEBUG)

Request moderate optimization.

Command line: /O2 or higher

Visual Studio* IDE: Fortran> Optimization> Optimization> Maximize Speed or higher

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Command line: /Qopt-report:5

Visual Studio* IDE: Fortran> Diagnostics> Optimization Diagnostic Level> Level 5 (/Qopt-report:5)

Enable vectorization

Command line: /Qvec

Enable SIMD directives

Command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Visual Studio* IDE: Fortran> Language> Process OpenMP Directives> Generate Parallel Code (/Qopenmp)

In addition: Verify your application runs before trying to analyze it with the Intel Advisor.

Set Up Environment

Do one of the following to set up your environment.

  • Run the <advisor-install-dir>\advixe-vars.bat command.

    The default installation path, <advisor-install-dir>, is C:\Program Files (x86)\IntelSWTools\Advisor XE 201n\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

  • Run the <parallel-studio-install-dir>\psxevars.bat command.

    The default installation path, <parallel-studio-install-dir>, is C:\Program Files (x86)\IntelSWTools.parallel_studio_xe_201n.<update number>.<package number>\.

Get Started

Follow these steps (white blocks are optional) to get started using the Vectorization Advisor in the Intel Advisor.
Vectorization Advisor workflow: Dig Deeper

Launch the Intel Advisor

For the standalone GUI interface, do one of the following:

  • Run the advixe-gui command.

  • From the Microsoft Windows* 7 Start menu, select Intel Parallel Studio XE 201n> Analyzers> Advisor XE.

  • From the Microsoft Windows* 8/8.1 Start screen, select Intel Advisor XE 201n.

  • From the Microsoft Windows* 8/8.1 All Apps screen, select Intel Parallel Studio XE 201n> Intel Advisor XE 201n.

For the Intel Advisor integrated into the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.

Manage Project

For the standalone GUI:

  1. Choose File> New> Project… (or click New Project… in the Welcome page) to open the Create a Project dialog box.

  2. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

  3. On the left side of the Analysis Target tab, ensure the Survey Hotspots/Suitability Analysis type is selected.

  4. Set the appropriate parameters. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

  5. Click the OK button to close the Project Properties dialog box and display an empty Survey Report and the VECTORIZATION WORKFLOW.

For the Visual Studio* IDE:

  1. Choose Project> Intel Advisor version Project Properties… to open the Project Properties dialog box.
  2. On the left side of the Analysis Target tab, ensure the Survey Hotspots/Suitability Analysis type is selected.

  3. Set the appropriate parameters. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

  4. Click the OK button to close the Project Properties dialog box.

Tip

  • If you plan to run other vectorization Analysis Types, set parameters for those Analysis Types now.

  • The Survey Trip Counts Analysis type has similar parameters to the Survey Hotspots/Suitability Analysis type.

  • The Dependencies Analysis and Memory Access Patterns Analysis types consume more resources than the Survey Hotspots/Suitability Analysis type. If these Refinement analyses take too long, consider decreasing the workload.

  • Select Track stack variables in the Dependencies Analysis type to detect all possible dependencies.

  • When necessary, click the control at the bottom of the WORKFLOW to switch between the VECTORIZATION WORKFLOW and THREADING WORKFLOW.

Run Survey Analysis

Under Survey Target in the VECTORIZATION WORKFLOW, click the Run control control to collect Survey data while your application executes.

Note

If the VECTORIZATION WORKFLOW is not displayed:

  1. In the Solution Explorer, right-click the project.

  2. Choose Intel Advisor XE 201n> Start Survey Analysis.

After the Intel Advisor collects the data, it displays a Survey Report similar to the following:


Survey Report
There are many controls available to help you focus on the data most important to you, including the following:

1

Click the various Filter controls (buttons and drop-down lists) to temporarily limit displayed data based on your criteria.

2

Click the Search control to search for specific data.

3

Click the Expand/Collapse controls to show/hide sets of columns.

4

Click a loop data row in the top of the Survey Report to display more data specific to that loop in the bottom of the Survey Report. Double-click a loop data row to display a Survey Source window.

5

Click a checkbox to mark a loop for deeper analysis.

6

If present, click a light bulb icon to display code-specific how-can-I-fix-this-issue? information in the Recommendations pane.

7

If present, click a book icon to display code-specific how-can-I-fix-this-issue? information in the Compiler Diagnostics pane.

8

Click the control to show/hide the WORKFLOW pane.

Run Trip Counts Analysis

This step is optional.

Before running a Trip Counts analysis, make sure you set the appropriate Project Properties for the Survey Trip Counts Analysis type. (Use the same application, but a smaller input data set if possible.)

Under Find Trip Counts in the VECTORIZATION WORKFLOW, click the Run control control to collect Trip Counts data while your application executes.

After the Intel Advisor collects the data, it adds a Trip Counts column set to the Survey Report. Median data is shown by default. Min, Max, Call Count, and Iteration Duration data are shown when the column set is expanded.

Tip

Use Trip Counts data to:

  • Detect loops with too-small trip counts and trip counts that are not a multiple of vector length.

  • Analyze parallelism granularity more deeply.

Investigate Loops

The Survey Report provides a wealth of information, including:

  • Key information from the Intel compiler vectorization and optimization reports

  • Source and assembly code for the data row selected at the top of the report

  • Code-specific how-can-I-fix-this-issue? recommendations for the data row selected at the top of the report, similar to the following:
    Recommendations pane

    Compiler Diagnostics pane

Tip

  • Pay particular attention to the hottest loops in terms of Self Time and Total Time. Optimizing these loops provides the most benefit. Innermost loops and loops near innermost loops are often good candidates for vectorization. Outermost loops with significant Total Time are often good candidates for parallelization with threads.

  • Check if the best possible Vector Instruction Set is used by your application, or if there are heavy operations required for vectorization that might be a problem, such as masking or gather operations.

  • Compare the modeled Estimated Achieved Gain with the gain expected from the Vector Instruction Set to ensure you are likely to get the optimal speed-up. For example: AVX2 processing of 32-bit integers should give an 8x performance gain. If the Estimated Achieved Gain is much lower than the expected gain for the Vector Instruction Set, consider optimizing an already vectorized loop by eliminating heavy vector operations, aligning data, or rewriting the loop to remove control-flow clauses.

  • A vectorized loop may not achieve the best performance when the compiler peels a source loop into peeled and remainder loops. If the peeled or remainder loop takes a significant portion of loop execution time, aligning data or changing the number of loop iterations may help.

After you investigate the data in the Survey Report, you have several choices:

If Your Investigation Shows This

Do This

All loops are vectorizing properly and performance is satisfactory.

You are done! Congratulations!

One or more loops is not vectorizing properly and performance is unsatisfactory.

  1. Improve application performance using Recommendations and Compiler Diagnostic Details information to guide your efforts.

  2. Rebuild your modified code.

  3. Run another Survey analysis to verify all loops are vectorizing properly and performance is satisfactory.

You need more information (because, for example, there is a vector assumed dependency compiler diagnostic, or there are expensive memory instructions like gathers, inserts, or shuffles).

Continue your investigation by:

  1. Marking one or more loops for deeper analysis

  2. Defining the appropriate Project Properties for the Refinement analysis you plan to run

  3. Running one or more Refinement analyses

If this further investigation shows there is room for improvement:

  1. Make the improvements.

  2. Rebuild your modified code.

  3. Run another Survey analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Otherwise, you are done!

Run Dependencies Analysis

This step is optional.

Before running a Dependencies analysis, make sure you:

  • Set the appropriate Project Properties for the Dependencies Analysis type. (Use the same application, but a smaller input data set if possible. And select Track stack variables to detect all possible dependencies.)

  • Mark one or more un-vectorized loops for deeper analysis in the Survey Report.

Under Check Dependences in the VECTORIZATION WORKFLOW, click the Correctness Report control to collect Dependencies data while your application executes.

After the Intel Advisor collects the data, it displays a Dependencies-focused Refinement Report similar to the following:
Dependencies Report

Depending on what the Dependencies Report shows, do one or more of the following:

  • Rewrite code to remove dependencies.

  • If there is no real dependency in the loop for the given workload, use one of the following to tell the compiler it is safe to vectorize:

    • #pragma simd ICL/ICC/ICPC directive, or #pragma omp simd OpenMP* 4.0 standard, or !DIR$ SIMD or !$OMP SIMD IFORT directive to ignore all dependencies in the loop

    • #pragma ivdep ICL/ICC/ICPC directive or !DIR$ IVDEP IFORT directive to ignore only vector dependencies (which is safest, but less powerful in certain cases)

    • restrict keyword

  • If there is an anti-dependency (often called a Write after read dependency or WAR), enable vectorization using the #pragma simd vectorlength(k) ICL/ICC/ICPC directive or !DIR$ SIMD VECTORLENGTH(k) IFORT directive, where k is smaller than the distance between dependent items in anti-dependency:

After you finish making improvements:

  1. Run a MAP analysis if desired.

  2. Rebuild your modified code.

  3. Run another Survey analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Run a Memory Access Patterns (MAP) Analysis

This step is optional.

Before running a MAP analysis, make sure you:

  • Set the appropriate Project Properties for the Memory Access Patterns Analysis type. (Use the same application, but a smaller input data set if possible.)

  • Mark one or more loops for deeper analysis in the Survey Report.

Under Check Memory Access Patterns in the VECTORIZATION WORKFLOW, click the Run control control to collect MAP data while your application executes.

After the Intel Advisor collects the data, it displays a MAP-focused Refinement Report similar to the following:
MAP Report

After you finish making improvements:

  1. Rebuild your modified code.

  2. Run another Survey analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Tip

Double-click source lines at the bottom of the report to get a more detailed source and assembly access pattern report where stride information is provided at the instruction level.

Troubleshooting/FAQ

Also, see https://software.intel.com/en-us/intel-advisor-xe-support/faq.

To Do This

Optimal C/C++ Settings

Retrieve better compiler diagnostics.

Disable Interprocedural Optimization (IPO):

  • Command line: /Qipo-

  • Visual Studio* IDE: C/C++> Optimization [Intel C++]> Interprocedural Optimization> No

Address any issues with source line matching.

Do one of the following:

  • Raise the debug level:

    • Command line: -debug: inline-debug-info

    • Visual Studio* IDE: You can add this option to Command line> Additional Options.

  • Temporarily disable inlining:

    • Command line: /Qip-no-inlining

    • Visual Studio* IDE: C/C++> Optimization> Inline Function Expansion> Only __inline (/Ob1) or higher

Experiment with generating code for different instructions.

Command line: /QxHost, /QxSSE4.2, /QxAVX, /QaxAVX, /QxCORE-AVX2, or /QaxCORE-AVX2

Visual Studio* IDE: C/C++> Code Generation [Intel C++]> Intel Processor-Specific Optimization

To Do This

Optimal Fortran Settings

Retrieve better compiler diagnostics.

Disable Interprocedural Optimization (IPO):

  • Command line: /Qipo-

  • Visual Studio* IDE: Fortran> Optimization> Interprocedural Optimization> No

Address any issues with source line matching.

Do one of the following:

  • Raise the debug level:

    • Command line: -debug: inline-debug-info

    • Visual Studio* IDE: You can add this option to Command line> Additional Options.

  • Temporarily disable inlining:

    • Command line: /Qip-no-inlining

    • Visual Studio* IDE: Fortran> Optimization> Inline Function Expansion> Only INLINE Directive (/Ob1) or higher

Experiment with generating code for different instructions.

Command line: /QxHost, /QxSSE4.2, /QxAVX, /QaxAVX, /QxCORE-AVX2, or /QaxCORE-AVX2

Visual Studio* IDE: Fortran> Code Generation> Intel Processor-Specific Optimization


Viewing all articles
Browse latest Browse all 1853

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>