Intel® VTune™ Amplifier XE can be installed on Windows*, OS X*, and Linux* platforms and used for analysis of local and remote target systems. Use this tool to analyze the algorithm choices, find serial and parallel code bottlenecks, understand where and how your application can benefit from available hardware resources, and speed up the execution.
VTune Amplifier XE is available as a standalone product as well as part of the following suites:
Select Your Host System to Get Started
Key Features
ALGORITHM ANALYSIS |
---|
PREVIEW: Run HPC Performance Characterization analysis to identify how effectively your application uses CPU, memory, and floating-point operation hardware resources. The HPC Performance Characterization analysis type can be used as a starting point for understanding the performance aspects of your application. Additional scalability metrics are available for applications that use Intel OpenMP or Intel MPI runtime libraries. Run Basic Hotspots analysis type to understand application flow and identify sections of code that get a lot of execution time (hotspots). See tutorials for Linux host - C++ sample | Linux host - Fortran sample | Windows host - C++ sample | Windows host - Fortran sample Use the algorithm Advanced Hotspots analysis to extend Basic Hotspots analysis by collecting call stacks, context switches and analyze CPI (Cycles Per Instructions) metric. See the tutorial for a C++ sample code running on Intel Xeon Phi™ coprocessor: Windows host - Linux host. Run Concurrency analysis to estimate parallelization in your code and understand how effectively your application uses available cores. See tutorials for Linux host - Fortran sample | Windows host - Fortran sample. Run Locks and Waits analysis to identify synchronization objects preventing effective utilization of processor resources. See tutorials for Linux host - C++ sample code | Windows host - C++ sample code.
| ![Basic Hotspots Analysis]()
|
MICROARCHITECTURE ANALYSIS |
---|
Run General Exploration analysis to triage hardware issues in your application. This type collects a complete list of events for analyzing a typical client application. See tutorials for Linux host - C++ sample code | Windows host - C++ sample code. Use Memory Access analysis to identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information. For systems with Intel Software Guard Extensions (Intel SGX) feature enabled, run SGX Hotspots analysis to identify performance-critical program units inside security enclaves. This analysis type uses the the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks which is mandatory for the analysis on the systems with the Intel SGX enabled. Narrow down your hardware analysis by focusing on specific hardware issues, such as ineffective memory accesses, low bandwidth, and so on. For Linux targets, you may also enable memory objects analysis and view performance metrics per variables, data structures, or arrays. For the Intel processors supporting Intel Transactional Synchronization Extensions (Intel TSX), run the TSX Exploration and TSX Hotspots analysis types to measure transactional success and analyze causes of transactional aborts.
| ![General Exploration Analysis]()
|
PLATFORM ANALYSIS |
---|
Run CPU/GPU Concurrency analysis to identify code regions where your application is CPU or GPU bound. For GPU-bound applications running on Intel HD Graphics, collect GPU hardware events to estimate how effectively the Processor Graphics are used. Analyze hot Intel® Media SDK programs and OpenCL™ kernels running on a GPU. For OpenCL application analysis, use the Architecture Diagram to explore GPU hardware metrics per GPU architecture blocks.
| ![GPU Analysis]()
|
SOURCE ANALYSIS |
---|
Double click a hotspot function to drill down to the source code and analyze performance per source line or assembler instruction. By default, the hottest line is highlighted. For help on an assembly instruction, right-click the instruction in the Assembly pane and select Instruction Reference from the context menu.
| ![Source View]()
|
OpenMP* AND MPI ANALYSIS |
---|
Run one of the Algorithm analysis types to collect OpenMP or MPI data for applications that use OpenMP or MPI runtime libraries. Start with the OpenMP Analysis section of the Summary window to identify inefficiencies in parallelization of your OpenMP application. Analyze the Potential Gain metric values per OpenMP region to understand the maximum time that could be saved if the OpenMP region is optimized to have no load imbalance assuming no runtime overhead. For hybrid OpenMP and MPI applications, explore OpenMP efficiency metrics by MPI processes laying on the critical path.
| ![OpenMP and MPI Analysis]()
|
CUSTOM ANALYSIS |
---|
Select Custom Analysis branch in the analysis tree to create your own analysis configurations using any of the available VTune Amplifier data collectors. Run your own custom collector from the VTune Amplifier to get the aggregated performance data, from your custom collection and VTune Amplifier analysis, in the same result. Import performance data collected by your own or third-party collector into the VTune Amplifier result collected in parallel with your external collection. Use the Import from CSV button to integrate the external data to the result.
| ![Custom Analysis]()
|
For the detailed list of product features, see Intel VTune Amplifier HelpWindows Host | Linux Host.
Legal Information
Intel, VTune and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.
© 2015 Intel Corporation