Unit 5 Part2
Unit 5 Part2
TRANSPARENT SCALABILITY
Coordinate the execution of multiple threads
• CUDA allows threads in the same block to coordinate their activities
using a barrier synchronization function __syncthreads().
• When a kernel function calls __syncthreads(), all threads in a block
will be held at the calling location until every thread in the block
reaches the location.
• It ensures that all threads in a block have completed a phase of their
execution of the kernel before any of them can move on to the next
phase.
Barrier synchronization
__syncthreads()