Six-stage CUDA parallel reduction optimization: from basic global memory to warp shuffles and bank conflict avoidance
-
Updated
May 16, 2025 - Jupyter Notebook
Six-stage CUDA parallel reduction optimization: from basic global memory to warp shuffles and bank conflict avoidance
Repository of the lab9 assignment for the Parallel Programming course.
Add a description, image, and links to the warp-shuffles topic page so that developers can more easily learn about it.
To associate your repository with the warp-shuffles topic, visit your repo's landing page and select "manage topics."