GitHub - fajxc/matrixMultiplication-CUDA

GPU-Accelerated Matrix Multiplication

This project implements matrix multiplication on both CPU (C) and GPU (CUDA) and compares performance across different matrix sizes. The goal was to see how GPU parallelism impacts runtime compared to a straightforward CPU implementation.

What I Did:

Wrote a simple CPU baseline using triple nested loops.

Implemented a CUDA kernel where each thread computes one output element.

Benchmarked CPU vs GPU for matrix sizes up to 1024×1024.

Logged results into CSV and plotted runtimes and speedup.

What I Learned

CPU is fine (or sometimes better) for small problems, but runtime grows quickly with matrix size.

GPUs crush large matrix multiplies because thousands of threads work in parallel.

Memory transfers between CPU and GPU matter, but once the problem is large enough the GPU dominates.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
matmul_cpu.c		matmul_cpu.c
matmul_cpu.exe		matmul_cpu.exe
matmul_gpu.cu		matmul_gpu.cu
matmul_gpu.exe		matmul_gpu.exe
plotResults.py		plotResults.py
results.csv		results.csv
runtime.png		runtime.png
speedup.png		speedup.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

fajxc/matrixMultiplication-CUDA

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages