While Concurrency analysis helps identify where your application is not parallel, Locks and Waits analysis helps identify the cause of ineffective processor utilization. One of the most common problems is threads waiting too long on synchronization objects (locks). Performance suffers when waits occur while cores are under-utilized.
Locks and Waits analysis uses user-mode sampling and tracing collection. With this analysis you can estimate the impact each synchronization object has on the application and understand how long the application had to wait on each synchronization object, or in blocking APIs, such as sleep and blocking I/O.
There are two groups of synchronization objects supported by the Intel® VTune™ Amplifier :
1) objects usually used for synchronization between threads, such as Mutex or Semaphore;
2) objects associated with waits on I/O operations, such as Stream.
You can choose to view Locks and Waits analysis results in any of the following viewpoints:
Viewpoint | Description |
---|---|
Hotspots | Helps identify hotspots - code regions in the application that consume a lot of CPU time. |
Hotspots by CPU Usage | Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into CPU usage states: idle, poor, fair, and good. |
Hotspots by Thread Concurrency | Helps identify hotspots - code regions in the application that consume a lot of CPU time. CPU time is broken down into thread concurrency states: idle, poor, fair, good, and over. |
Locks and Waits | Shows how your application is utilizing available CPU cores and helps identify the cause of ineffective utilization, for example: threads waiting too long on synchronization objects (locks), I/O, or timers while CPU cores are underutilized. CPU time is represented by bars colored according to the CPU utilization level during the wait. |
By default, the VTune Amplifier displays the results of Locks and Waits analysis in the Locks and Waits viewpoint where:
Summary window displays statistics on the overall application execution in terms of Wait time and processor utilization during the wait
Bottom-up pane displays hotspot synchronization objects in the bottom-up tree, Wait time and CPU utilization per synchronization object
Top-down Tree pane displays hotspot synchronization objects in the top-down tree, Wait time and utilization for a wait function only (Self time) and for a wait function and its children together (Total time)
Call Stack pane displays stacks for each wait function
Timeline pane displays thread activity and transitions
What's Next
Identify the objects that caused contention and go to the source code to fix the problem. Concentrate your tuning on objects with long Wait time where the system is poorly utilized (red bars) during the wait. Consider adding parallelism, rebalancing, or reducing contention. Ideal utilization (green bars) occurs when the number of running threads equals the number of available cores.
After modifying your code, run the comparison analysis to identify the performance boost you gain and more possible areas for improvement.