Moving sequential code to the GPU
We learned that moving data to and from the GPU can be costly, and we learned that we can overlay those actions with computation to decrease the time taken to transfer data. However, there are times when we need to perform an intermediate sequential step between two GPU processing phases, and we then have to decide whether to move data out of GPU memory or whether we are going to move our sequential code into the GPU, even though it will not fully utilize the available resources.
Although it may seem a little counterintuitive at first, this is a very legitimate question to ask. It is not a matter of right or wrong, but rather of what will execute fastest and what the associated cost is – even if the cost is maintainability.
One important thing to keep in mind, based on the measurements we observed in Chapter 8, is that typically we can hide the computation time by correctly partitioning the data, in that the total execution time is...