Commit 28fbbf1
Fix GPU ||b||_2 computation: sync and NVHPC workaround
Three fixes for the adaptive MG convergence check on GPU:
1. Add cudaDeviceSynchronize before ||b||_2 computation
The data copy from rhs_present to f_level0_ptr_ may not be complete
without explicit sync, causing the reduction to read stale/garbage data.
2. Use f_level0_ptr_ instead of f_ptrs_[0] for omp_get_mapped_ptr
Vector element access (f_ptrs_[0]) can return stale addresses in NVHPC
target regions. Member pointer f_level0_ptr_ is set once and stable.
3. Add sanity check for garbage b_l2_ values
If the reduction returns NaN/Inf or suspiciously small values, set
b_l2_=0 to force the convergence check to use raw residual instead of
relative. This prevents early exit on bad reduction results.
These fixes address the GPU CI test failure where GalileanStageBreakdownTest
was failing with 775x divergence ratios instead of expected <3x.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent 2476230 commit 28fbbf1
1 file changed
+11
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1871 | 1871 | | |
1872 | 1872 | | |
1873 | 1873 | | |
| 1874 | + | |
| 1875 | + | |
1874 | 1876 | | |
1875 | 1877 | | |
1876 | 1878 | | |
| |||
1881 | 1883 | | |
1882 | 1884 | | |
1883 | 1885 | | |
1884 | | - | |
| 1886 | + | |
| 1887 | + | |
1885 | 1888 | | |
1886 | | - | |
| 1889 | + | |
1887 | 1890 | | |
1888 | 1891 | | |
1889 | 1892 | | |
| |||
1911 | 1914 | | |
1912 | 1915 | | |
1913 | 1916 | | |
| 1917 | + | |
| 1918 | + | |
| 1919 | + | |
| 1920 | + | |
| 1921 | + | |
| 1922 | + | |
1914 | 1923 | | |
1915 | 1924 | | |
1916 | 1925 | | |
| |||
0 commit comments