Fix 2D ||b||_2 indexing: read from k=Ng plane, not k=0

sbryngelson · claude · sbryngelson · commit f440c6923331 · 2026-01-27T18:50:54.000-05:00
For 2D grids with ghost cells, the memory layout uses Sz = 1 + 2*Ng planes, with actual data stored at plane k=Ng (the middle plane). Ghost planes k=0 and k=2 contain boundary data. The GPU reduction for ||b||_2 was using `idx = j*stride + i` which reads from k=0 (ghost plane, contains zeros/garbage). This caused the adaptive mode to compute wrong relative residuals and exit early. Fixed by adding `k_plane_offset = Ng * plane_stride` to the 2D index calculation, matching how the original CPU code and 3D code work. Bug introduced: de074c0 (GPU reduction for ||b||_2) Root cause: 2D memory layout wasn't accounted for in GPU reduction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
diff --git a/src/poisson_solver_multigrid.cpp b/src/poisson_solver_multigrid.cpp
@@ -1891,11 +1891,14 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
                 double b_sum_sq = 0.0;
 
                 if (is_2d) {
+                    // 2D data lives at plane k=Ng (middle z-plane), not k=0
+                    // Memory layout: Sz = 1 + 2*Ng, data at plane Ng
+                    const int k_plane_offset = Ng * plane_stride;
                     #pragma omp target teams distribute parallel for collapse(2) \
                         is_device_ptr(f_ptr) reduction(+: b_sum_sq)
                     for (int j = Ng; j < Ny + Ng; ++j) {
                         for (int i = Ng; i < Nx + Ng; ++i) {
-                            int idx = j * stride + i;
+                            int idx = k_plane_offset + j * stride + i;
                             double val = f_ptr[idx];
                             b_sum_sq += val * val;
                         }
@@ -2027,11 +2030,14 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
         int device = omp_get_default_device();
         const double* f_dev = static_cast<const double*>(omp_get_mapped_ptr(f_level0_ptr_, device));
         if (is_2d_gpu) {
+            // 2D data lives at plane k=Ng (middle z-plane), not k=0
+            // Memory layout: Sz = 1 + 2*Ng, data at plane Ng
+            const int k_plane_offset = Ng * plane_stride_gpu;
             #pragma omp target teams distribute parallel for collapse(2) \
                 is_device_ptr(f_dev) reduction(max: b_inf_local) reduction(+: b_sum_sq)
             for (int j = Ng; j < Ny_g + Ng; ++j) {
                 for (int i = Ng; i < Nx_g + Ng; ++i) {
-                    int idx = j * stride_gpu + i;
+                    int idx = k_plane_offset + j * stride_gpu + i;
                     double val = f_dev[idx];
                     b_inf_local = std::max(b_inf_local, std::abs(val));
                     b_sum_sq += val * val;