Computing matrix addition and multiplication
Let’s dive into matrix addition and matrix multiplication, two operations that at first glance might look similar but which actually differ quite a bit, especially when it comes to how they are handled by computers – and particularly when programming with CUDA.
Matrix addition is straightforward. Imagine we have two matrices, say A and B, that are the same size. We just add each element in A to the corresponding element in B. So, C[i][j]=A[i][j]+B[i][j] for every row i and column j. Since each element’s addition happens independently of the others, this is an embarrassingly parallel task. That’s a fancy way of saying that each calculation is independent of any others, so in theory we could compute them all at once without needing to coordinate or wait for results from other parts of the matrix.
On the GPU, matrix addition is highly efficient because there is no need for synchronization between threads...