Sometimes the loop control is spread across complex control flow. Using Intel TBB or Intel Cilk Plus in this situation requires more features than the simple loops. Note that the task body must not access any of the auto variables defined within the annotation site, because they may have been destroyed before or while the task is running. Consider this serial code:
extern char a[]; int previousEnd = -1; ANNOTATE_SITE_BEGIN(sitename); for (int i=0; i<=100; i++) { if (!a[i] || i==100) { ANNOTATE_TASK_BEGIN(do_something); DoSomething(previousEnd+1,i); ANNOTATE_TASK_END(); previousEnd=i; } } ANNOTATE_SITE_END();
In general, counted loops have better scalability than loops with complex iteration control, because the complex control is inherently sequential. Consider reformulating your code as a counted loop if possible.
One approach to parallelize the above loop is simply to spawn each call to DoSomething()
:
#include <cilk/cilk.h> ... extern char a[]; int previousEnd = -1; for (int i=0; i<=100; i++) { if (!a[i] || i==100) cilk_spawn DoSomething(previousEnd+1,i); previousEnd=i; } cilk_sync;
It is important that the parameters to DoSomething
be passed by value, not by reference, because previousEnd
and i
can change before or while the spawned task runs.