Skip to content

Commit f440c69

Browse files
sbryngelsonclaude
andcommitted
Fix 2D ||b||_2 indexing: read from k=Ng plane, not k=0
For 2D grids with ghost cells, the memory layout uses Sz = 1 + 2*Ng planes, with actual data stored at plane k=Ng (the middle plane). Ghost planes k=0 and k=2 contain boundary data. The GPU reduction for ||b||_2 was using `idx = j*stride + i` which reads from k=0 (ghost plane, contains zeros/garbage). This caused the adaptive mode to compute wrong relative residuals and exit early. Fixed by adding `k_plane_offset = Ng * plane_stride` to the 2D index calculation, matching how the original CPU code and 3D code work. Bug introduced: de074c0 (GPU reduction for ||b||_2) Root cause: 2D memory layout wasn't accounted for in GPU reduction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent d4eb9dd commit f440c69

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

src/poisson_solver_multigrid.cpp

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1891,11 +1891,14 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
18911891
double b_sum_sq = 0.0;
18921892

18931893
if (is_2d) {
1894+
// 2D data lives at plane k=Ng (middle z-plane), not k=0
1895+
// Memory layout: Sz = 1 + 2*Ng, data at plane Ng
1896+
const int k_plane_offset = Ng * plane_stride;
18941897
#pragma omp target teams distribute parallel for collapse(2) \
18951898
is_device_ptr(f_ptr) reduction(+: b_sum_sq)
18961899
for (int j = Ng; j < Ny + Ng; ++j) {
18971900
for (int i = Ng; i < Nx + Ng; ++i) {
1898-
int idx = j * stride + i;
1901+
int idx = k_plane_offset + j * stride + i;
18991902
double val = f_ptr[idx];
19001903
b_sum_sq += val * val;
19011904
}
@@ -2027,11 +2030,14 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
20272030
int device = omp_get_default_device();
20282031
const double* f_dev = static_cast<const double*>(omp_get_mapped_ptr(f_level0_ptr_, device));
20292032
if (is_2d_gpu) {
2033+
// 2D data lives at plane k=Ng (middle z-plane), not k=0
2034+
// Memory layout: Sz = 1 + 2*Ng, data at plane Ng
2035+
const int k_plane_offset = Ng * plane_stride_gpu;
20302036
#pragma omp target teams distribute parallel for collapse(2) \
20312037
is_device_ptr(f_dev) reduction(max: b_inf_local) reduction(+: b_sum_sq)
20322038
for (int j = Ng; j < Ny_g + Ng; ++j) {
20332039
for (int i = Ng; i < Nx_g + Ng; ++i) {
2034-
int idx = j * stride_gpu + i;
2040+
int idx = k_plane_offset + j * stride_gpu + i;
20352041
double val = f_dev[idx];
20362042
b_inf_local = std::max(b_inf_local, std::abs(val));
20372043
b_sum_sq += val * val;

0 commit comments

Comments
 (0)