Quantcast
Viewing all articles
Browse latest Browse all 1853

Threading Advisor-Linux* OS

Intel® Advisor 2016 offers a vectorization analysis tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ applications take full performance advantage of today’s processors. This topic discusses getting started with the Threading Advisor GUI on a Linux* platform.

Key Features

Key Threading Advisor features include the following:

  • Survey Report - Shows the loops and functions where your application spends the most time. Use this information to discover candidates for parallelization with threads.

  • Trip Counts analysis - Shows the minimum, maximum, and median number of times a loop body will execute, as well as the number of times a loop is invoked. Use this information to make better decisions about your threading strategy for particular loops.

  • Annotations - Insert to mark places in your application that are good candidates for later replacement with parallel framework code that enables threading parallel execution. Annotations are subroutine calls or macros (depending on the programming language) that can be processed by your current compiler but do not change the computations of your application.

  • Suitability Report - Predicts the maximum speed-up of your application based on the inserted annotations and a variety of what-if modeling parameters with which you can experiment. Use this information to choose the best candidates for parallelization with threads.

  • Dependencies Report - Predicts parallel data sharing problems based on the inserted annotations. Use this information to fix the data sharing problems if the predicted maximum speed-up benefit justifies the effort.

Prerequisites

To build applications that produce the most accurate and complete Threading Advisor analysis results, build an optimized binary of your application in release mode using these settings:

To Do This

Optimal C/C++ Settings

Search additional directory related to Intel Advisor annotation definitions.

-I${ADVISOR_XE_2016_DIR}/include

Request full debug information (compiler and linker).

-g

Request moderate optimization.

-O2 or higher

Search for unresolved references in multithreaded, dynamically linked libraries.

-Bdynamic

Enable dynamic loading.

-ldl

To Do This

Optimal Fortran Settings

Search additional directory related to Intel Advisor annotation definitions.

  • -I${ADVISOR_XE_2016_DIR}/include/ia32 or -I${ADVISOR_XE_2016_DIR}/include/intel64

  • -L${ADVISOR_XE_2016_DIR}/lib32 or -L${ADVISOR_XE_2016_DIR}/lib64

  • -ladvisor

Request full debug information (compiler and linker).

-g

Request moderate optimization.

-O2 or higher

Search for unresolved references in multithreaded, dynamically linked libraries.

-shared-intel

Enable dynamic loading.

-ldl

In addition:

  • Verify your application runs before trying to analyze it with the Intel Advisor.

  • Make sure you run the Intel Advisor in the same environment as your application.

Set Up Environment

Do one of the following to set up your environment.

  • Run one of the following source commands:

    • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

    • For bash users: source <advisor-install-dir>/advixe-vars.sh

    The default installation path, <advisor-install-dir>:

    • For root users: /opt/intel/parallel_studio_xe_201n/advisor_xe_201n/

    • For non-root users: $HOME/intel/parallel_studio_xe_201n/advisor_xe_201n/

  • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

  • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>:

    • For root users: /opt/intel/parallel_studio_xe_201n/

    • For non-root users: $HOME/intel/parallel_studio_xe_201n/

Get Started

Follow these steps (white blocks are optional) to get started using the Threading Advisor in the Intel Advisor.
Image may be NSFW.
Clik here to view.
Threading Advisor workflow

Launch the Intel Advisor

Run the advixe-gui command.

Manage Project

  1. Choose File> New> Project… (or click New Project… in the Welcome page) to open the Create a Project dialog box.

  2. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

  3. On the left side of the Analysis Target tab, ensure the Survey Hotspots/Suitability Analysis type is selected.

  4. Set the appropriate parameters, and binary/symbol search and source search directories.

  5. After you click the OK button to close the Project Properties dialog box, the Intel Advisor displays an empty Survey Report and the VECTORIZATION WORKFLOW.

  6. Click the control at the bottom of the WORFKLOW to switch between the VECTORIZATION WORKFLOW and THREADING WORKFLOW.

Tip

  • If possible, set parameters for all threading Analysis Types now.

  • The Survey Trip Counts Analysis type has similar parameters to the Survey Hotspots/Suitability Analysis type.

  • The Dependencies Analysis type consumes more resources than the Survey Hotspots/Suitability Analysis type. If the Dependencies analysis takes too long, consider decreasing the workload.

Run Survey Analysis

Under Survey Target in the THREADING WORKFLOW, click the Image may be NSFW.
Clik here to view.
Run control
control to collect Survey data while your application executes. Use the resulting information to discover candidates for parallelization with threads.

Run Trip Counts Analysis

This step is optional.

Before running a Trip Counts analysis, make sure you set the appropriate Project Properties for the Survey Trip Counts Analysis type.

Under Find Trip Counts in the THREADING WORKFLOW, click the Image may be NSFW.
Clik here to view.
Run control
control to collect Trip Counts data while your application executes. Use the resulting information to make better decisions about your threading strategy for particular loops.

Investigate Loops

Pay particular attention to the hottest loops in terms of Self Time and Total Time. Optimizing these loops provides the most benefit. Outermost loops with significant Total Time are often good candidates for parallelization with threads. Innermost loops and loops near innermost loops are often good candidates for vectorization.

Annotate Sources

Insert annotations to mark places in parts of your application that are good candidates for later replacement with parallel framework code that enables parallel execution.

The main types of Intel Advisor annotations mark the location of:

  • A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in one or more parallel threads to distribute work. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree.

  • One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work.

  • Locking synchronization, where mutual exclusion of data access must occur in the parallel application.

Intel Advisor provides example annotated source code for you (accessible in the Assistance tab of the Survey Report and in the Survey Source windows) that you can copy directly into your editor:

Annotation Code Snippet

Purpose

Iteration Loop, Single Task

Create a simple loop structure, where the task code includes the entire loop body. This common task structure is useful when only a single task is needed within a parallel site.

Loop, One or More Tasks

Create loops where the task code does not include all of the loop body, or complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations. This structure is also useful when multiple tasks are needed within a parallel site.

Function, One or More Tasks

Create code that calls multiple tasks within a parallel site.

Pause/Resume Collection

Temporarily pause data collection and later resume it, so you can skip uninteresting parts of application execution to minimize collected data and speed up analysis of large applications. Add these annotations outside a parallel site.

Build Settings

Set build (compiler and linker) settings specific to the language in use.

Tip

Choosing where to add task annotations may require some experimentation. If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop.

Run Suitability Analysis

After you insert annotations into your source code, rebuild your application in release mode.

Under Check Suitability in the THREADING WORKFLOW, click the Image may be NSFW.
Clik here to view.
Run control
control to collect Suitability data while your application executes.

After the Intel Advisor collects the data, it displays a Suitability Report similar to the following:
Image may be NSFW.
Clik here to view.
Suitability Report

The Suitability Report predicts maximum speed-up data based on the inserted annotations and what-if modeling parameters with which you can experiment, such as:

  • Different hardware configurations and parallel frameworks

  • Different trip counts and instance durations

  • Any plans to address parallel overhead, lock contention, or task chunking when you implement your parallel framework code

Use the resulting information to choose the best candidates for parallelization with threads.

For example: The Scalability of Maximum Site Gain diagram graphically shows the predicted maximum speed-up for a selected parallel site in different scaling scenarios based on currently selected modeling parameters.

A Bulls-Eye in This Area

Means This

Red

Parallelization with threads is not beneficial - and may even cause performance degradation. Consider removing or modifying annotations, or significantly refactoring the corresponding hotspot if you want to parallelize it at any cost.

Yellow

The predicted maximum speed-up may not be enough to justify the effort needed to refactor and maintain your application. Consider investigating.

Green

Parallel performance - and power efficiency - may improve significantly.

Run Dependencies Analysis

Before running a Dependencies analysis, make sure you set the appropriate Project Properties for the Dependencies Analysis type. (Use the same application, but a smaller input data set if possible.)

Under Check Dependences in the THREADING WORKFLOW, click the Image may be NSFW.
Clik here to view.
Correctness Report
control to collect Dependencies data while your application executes. Use the resulting information to fix the data sharing problems if the predicted maximum speed-up benefit justifies the effort.

Improve App Performance

This step is optional.

If you decide the predicted maximum speed-up benefit is worth the effort to add threading parallelism to your application,

  1. Complete developer/architect design and code reviews about the proposed parallel changes.

  2. Choose one parallel programming framework (threading model) for your application, such as Intel® Threading Building Blocks (Intel® TBB), OpenMP*, Intel® Cilk™ Plus, or some other parallel framework.

  3. Add the parallel framework to your build environment.

  4. Add parallel framework code to synchronize access to the shared data resources, such as Intel TBB or OpenMP* locks or Intel Cilk Plus reducers.

  5. Add parallel framework code to create parallel tasks.

As you add the appropriate parallel code from the chosen parallel framework during steps 4 and 5, you can keep, comment out, or replace the Intel Advisor annotations.


Viewing all articles
Browse latest Browse all 1853

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>