Skip to content

Commit 28fbbf1

Browse files
sbryngelsonclaude
andcommitted
Fix GPU ||b||_2 computation: sync and NVHPC workaround
Three fixes for the adaptive MG convergence check on GPU: 1. Add cudaDeviceSynchronize before ||b||_2 computation The data copy from rhs_present to f_level0_ptr_ may not be complete without explicit sync, causing the reduction to read stale/garbage data. 2. Use f_level0_ptr_ instead of f_ptrs_[0] for omp_get_mapped_ptr Vector element access (f_ptrs_[0]) can return stale addresses in NVHPC target regions. Member pointer f_level0_ptr_ is set once and stable. 3. Add sanity check for garbage b_l2_ values If the reduction returns NaN/Inf or suspiciously small values, set b_l2_=0 to force the convergence check to use raw residual instead of relative. This prevents early exit on bad reduction results. These fixes address the GPU CI test failure where GalileanStageBreakdownTest was failing with 775x divergence ratios instead of expected <3x. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 2476230 commit 28fbbf1

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

src/poisson_solver_multigrid.cpp

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1871,6 +1871,8 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
18711871

18721872
// Compute ||b||_2 BEFORE running any V-cycles (RHS is still pristine)
18731873
// This matches the convergence mode pattern
1874+
// CRITICAL: Sync to ensure data copy is complete before reading
1875+
CUDA_CHECK_SYNC(cudaDeviceSynchronize());
18741876
{
18751877
auto& finest = *levels_[0];
18761878
const int Ng = finest.Ng;
@@ -1881,9 +1883,10 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
18811883
const int plane_stride = finest.plane_stride;
18821884
const bool is_2d = finest.is2D();
18831885

1884-
// NVHPC WORKAROUND: Use omp_get_mapped_ptr for actual device addresses
1886+
// NVHPC WORKAROUND: Use member pointer f_level0_ptr_ instead of vector element f_ptrs_[0]
1887+
// Vector element access can return stale addresses in NVHPC
18851888
int device = omp_get_default_device();
1886-
const double* f_ptr = static_cast<const double*>(omp_get_mapped_ptr(f_ptrs_[0], device));
1889+
const double* f_ptr = static_cast<const double*>(omp_get_mapped_ptr(f_level0_ptr_, device));
18871890

18881891
double b_sum_sq = 0.0;
18891892

@@ -1911,6 +1914,12 @@ int MultigridPoissonSolver::solve_device(double* rhs_present, double* p_present,
19111914
}
19121915
}
19131916
b_l2_ = std::sqrt(b_sum_sq);
1917+
1918+
// Sanity check: if ||b||_2 is invalid or garbage, fall back to all cycles
1919+
// This prevents early exit on bad reduction results
1920+
if (!std::isfinite(b_l2_) || b_l2_ < 1e-30) {
1921+
b_l2_ = 0.0; // Force rel_res check to use raw residual
1922+
}
19141923
}
19151924

19161925
// First batch of cycles

0 commit comments

Comments
 (0)