Intel® Advisor 2016 offers a vectorization analysis tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ native/managed applications take full performance advantage of today’s processors. This topic discusses getting started with the Threading Advisor GUI on a Windows* platform.
Key Features
Key Threading Advisor features include the following:
Survey Report - Shows the loops and functions where your application spends the most time. Use this information to discover candidates for parallelization with threads.
Trip Counts analysis - Shows the minimum, maximum, and median number of times a loop body will execute, as well as the number of times a loop is invoked. Use this information to make better decisions about your threading strategy for particular loops.
Annotations - Insert to mark places in your application that are good candidates for later replacement with parallel framework code that enables threading parallel execution. Annotations are subroutine calls or macros (depending on the programming language) that can be processed by your current compiler but do not change the computations of your application.
Suitability Report - Predicts the maximum speed-up of your application based on the inserted annotations and a variety of what-if modeling parameters with which you can experiment. Use this information to choose the best candidates for parallelization with threads.
Dependencies Report - Predicts parallel data sharing problems based on the inserted annotations. Use this information to fix the data sharing problems if the predicted maximum speed-up benefit justifies the effort.
Prerequisites
To build applications that produce the most accurate and complete Threading Advisor analysis results, build an optimized binary of your application in release mode using these settings:
To Do This | Optimal C/C++ Settings |
---|---|
Search additional directory related to Intel Advisor annotation definitions. | Command line: /I"%ADVISOR_XE_2016_DIR%"\include Microsoft Visual Studio* IDE: C/C++> General> Additional Include Directories> $(ADVISOR_XE_2016_DIR)\include;%(AdditionalIncludeDirectories) |
Request full debug information (compiler and linker). |
|
Request moderate optimization. |
|
Search for unresolved references in multithreaded, dynamically linked libraries. | Command line: /MD or /MDd Visual Studio* IDE: C/C++> Code Generation> Runtime Library> Multi-threaded DLL (/MD) or Multi-threaded Debug DLL (/MDd) |
To Do This | Optimal Fortran Settings |
---|---|
Search additional directory related to Intel Advisor annotation definitions. |
|
Request full debug information (compiler and linker). |
|
Request moderate optimization. |
|
Search for unresolved references in multithreaded, dynamically linked libraries. | Command line: /MD or /MDd Visual Studio* IDE: Fortran> Libraries> Runtime Library> Multithread DLL (/libs:dll /threads) or Debug Multithread DLL (/libs:dll /threads /dbglibs) |
In addition: Verify your application runs before trying to analyze it with the Intel Advisor.
Set Up Environment
Do one of the following to set up your environment.
Run the <advisor-install-dir>\advixe-vars.bat command.
The default installation path, <advisor-install-dir>, is C:\Program Files (x86)\IntelSWTools\Advisor XE 201n\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).
Run the <parallel-studio-install-dir>\psxevars.bat command.
The default installation path, <parallel-studio-install-dir>, is C:\Program Files (x86)\IntelSWTools.parallel_studio_xe_201n.<update number>.<package number>\.
Get Started
Follow these steps (white blocks are optional) to get started using the Threading Advisor in the Intel Advisor.
Launch the Intel Advisor
For the standalone GUI interface, do one of the following:
Run the advixe-gui command.
From the Microsoft Windows* 7 Start menu, select Intel Parallel Studio XE 201n> Analyzers> Advisor XE.
From the Microsoft Windows* 8/8.1 Start screen, select Intel Advisor XE 201n.
From the Microsoft Windows* 8/8.1 All Apps screen, select Intel Parallel Studio XE 201n> Intel Advisor XE 201n.
For the Intel Advisor integrated into the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.
Manage Project
For the standalone GUI:
Choose File> New> Project… (or click New Project… in the Welcome page) to open the Create a Project dialog box.
Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.
On the left side of the Analysis Target tab, ensure the Survey Hotspots/Suitability Analysis type is selected.
Set the appropriate parameters, and binary/symbol search and source search directories.
After you click the OK button to close the Project Properties dialog box, the Intel Advisor displays an empty Survey Report and the VECTORIZATION WORKFLOW.
Click the control at the bottom of the WORFKLOW to switch between the VECTORIZATION WORKFLOW and THREADING WORKFLOW.
- Choose Project> Intel Advisor version Project Properties… to open the Project Properties dialog box.
On the left side of the Analysis Target tab, ensure the Survey Hotspots/Suitability Analysis type is selected.
Set the appropriate parameters, and binary/symbol search and source search directories.
Click the OK button to close the Project Properties dialog box.
Tip
If possible, set parameters for all threading Analysis Types now.
The Survey Trip Counts Analysis type has similar parameters to the Survey Hotspots/Suitability Analysis type.
The Dependencies Analysis type consumes more resources than the Survey Hotspots/Suitability Analysis type. If the Dependencies analysis takes too long, consider decreasing the workload.
Run Survey Analysis
Under Survey Target in the THREADING WORKFLOW, click the control to collect Survey data while your application executes. Use the resulting information to discover candidates for parallelization with threads.
Run Trip Counts Analysis
This step is optional.
Before running a Trip Counts analysis, make sure you set the appropriate Project Properties for the Survey Trip Counts Analysis type.
Under Find Trip Counts in the THREADING WORKFLOW, click the control to collect Trip Counts data while your application executes. Use the resulting information to make better decisions about your threading strategy for particular loops.
Investigate Loops
Pay particular attention to the hottest loops in terms of Self Time and Total Time. Optimizing these loops provides the most benefit. Outermost loops with significant Total Time are often good candidates for parallelization with threads. Innermost loops and loops near innermost loops are often good candidates for vectorization.
Annotate Sources
Insert annotations to mark places in parts of your application that are good candidates for later replacement with parallel framework code that enables parallel execution.
The main types of Intel Advisor annotations mark the location of:
A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in one or more parallel threads to distribute work. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree.
One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work.
Locking synchronization, where mutual exclusion of data access must occur in the parallel application.
Intel Advisor provides example annotated source code for you (accessible in the Assistance tab of the Survey Report and in the Survey Source windows) that you can copy directly into your editor:
Annotation Code Snippet | Purpose |
---|---|
Iteration Loop, Single Task | Create a simple loop structure, where the task code includes the entire loop body. This common task structure is useful when only a single task is needed within a parallel site. |
Loop, One or More Tasks | Create loops where the task code does not include all of the loop body, or complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations. This structure is also useful when multiple tasks are needed within a parallel site. |
Function, One or More Tasks | Create code that calls multiple tasks within a parallel site. |
Pause/Resume Collection | Temporarily pause data collection and later resume it, so you can skip uninteresting parts of application execution to minimize collected data and speed up analysis of large applications. Add these annotations outside a parallel site. |
Build Settings | Set build (compiler and linker) settings specific to the language in use. |
Tip
Choosing where to add task annotations may require some experimentation. If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop.
Run Suitability Analysis
After you insert annotations into your source code, rebuild your application in release mode.
Under Check Suitability in the THREADING WORKFLOW, click the control to collect Suitability data while your application executes.
After the Intel Advisor collects the data, it displays a Suitability Report similar to the following:
The Suitability Report predicts maximum speed-up data based on the inserted annotations and what-if modeling parameters with which you can experiment, such as:
Different hardware configurations and parallel frameworks
Different trip counts and instance durations
Any plans to address parallel overhead, lock contention, or task chunking when you implement your parallel framework code
Use the resulting information to choose the best candidates for parallelization with threads.
For example: The Scalability of Maximum Site Gain diagram graphically shows the predicted maximum speed-up for a selected parallel site in different scaling scenarios based on currently selected modeling parameters.
A Bulls-Eye in This Area | Means This |
---|---|
Red | Parallelization with threads is not beneficial - and may even cause performance degradation. Consider removing or modifying annotations, or significantly refactoring the corresponding hotspot if you want to parallelize it at any cost. |
Yellow | The predicted maximum speed-up may not be enough to justify the effort needed to refactor and maintain your application. Consider investigating. |
Green | Parallel performance - and power efficiency - may improve significantly. |
Run Dependencies Analysis
Before running a Dependencies analysis, make sure you set the appropriate Project Properties for the Dependencies Analysis type. (Use the same application, but a smaller input data set if possible.)
Under Check Dependences in the THREADING WORKFLOW, click the control to collect Dependencies data while your application executes. Use the resulting information to fix the data sharing problems if the predicted maximum speed-up benefit justifies the effort.
Improve App Performance
This step is optional.
If you decide the predicted maximum speed-up benefit is worth the effort to add threading parallelism to your application,
Complete developer/architect design and code reviews about the proposed parallel changes.
Choose one parallel programming framework (threading model) for your application, such as Intel® Threading Building Blocks (Intel® TBB), OpenMP*, Intel® Cilk™ Plus, Microsoft Task Parallel Library* (TPL), or some other parallel framework.
Add the parallel framework to your build environment.
Add parallel framework code to synchronize access to the shared data resources, such as Intel TBB or OpenMP* locks or Intel Cilk Plus reducers.
Add parallel framework code to create parallel tasks.
As you add the appropriate parallel code from the chosen parallel framework during steps 4 and 5, you can keep, comment out, or replace the Intel Advisor annotations.