Skip to content

r.sim: Parallelize dx/dy derivatives computation using OpenMP#7094

Open
Abhi-d-gr8 wants to merge 1 commit intoOSGeo:mainfrom
Abhi-d-gr8:dxdy-multicore
Open

r.sim: Parallelize dx/dy derivatives computation using OpenMP#7094
Abhi-d-gr8 wants to merge 1 commit intoOSGeo:mainfrom
Abhi-d-gr8:dxdy-multicore

Conversation

@Abhi-d-gr8
Copy link
Contributor

@Abhi-d-gr8 Abhi-d-gr8 commented Feb 16, 2026

Overview

This PR parallelizes the dx/dy slope derivatives computation in simlib/derivatives.c (Horn 3×3 method), addressing #7039.

The main simulation already supports multiple cores, but the dx/dy computation was still serial. This change enables that step to use available CPU cores as well.


What changed

  • Added an OpenMP parallel for on the outer row loop.
  • Used default(none) with explicit shared(...) and private(...) scoping.
  • Preserved original math and behavior.
  • Serial builds (without OpenMP) remain unchanged.

Performance Evaluation

To isolate the impact of this change, timing was performed inside derivatives() only, excluding raster I/O and the main simulation loop.

This avoids masking the effect, since total r.sim.water runtime is dominated by the simulation phase.

Test Setup

  • Grid size: 5000 × 5000 (25 million cells)
  • Interleaved runs: 1-thread vs 8-thread
  • First run discarded to avoid warm-up bias
  • Measured using omp_get_wtime() around derivatives() only

Results (Apple M3 Air)

  • OMP_NUM_THREADS=1: ~0.19 s (average)
  • OMP_NUM_THREADS=8: ~0.06 s (average)
  • Speedup: ~3.2×

This confirms that the dx/dy computation scales across cores as intended.

As expected, the overall r.sim.water runtime improvement is smaller since derivatives() is only one stage of the workflow.

telegram-cloud-photo-size-5-6093558501759192775-y


Build / OpenMP Notes

To observe multi-threaded behavior in derivatives():

  • GRASS must be configured with OpenMP support:

    ./configure --with-openmp
    
  • The compiler must support OpenMP (e.g., libomp on macOS).

  • The source includes guarded OpenMP headers:

    #if defined(_OPENMP)
    #include <omp.h>
    #endif
  • Control the number of threads via:

    OMP_NUM_THREADS=<N>
    

If OpenMP is not available, the code compiles and runs serially without any behavior or numerical changes.


Reproducible Benchmark Command

For reference, this is the exact command used for the interleaved benchmark:

g.gisenv set=DEBUG=1
g.region n=5000 s=0 e=5000 w=0 res=1

echo "=== Interleaved 1-vs-8 thread runs (5000x5000 = 25M cells) ==="

for nprocs in 1 8 1 8 1 8; do
  r.sim.water elevation=elev rain_value=50 man_value=0.05 \
      depth=depth_bench discharge=disch_bench \
      nwalkers=5000 niterations=0 random_seed=1 nprocs=$nprocs \
      2>&1 | grep -E "OpenMP threads|derivatives time"
  echo "---"
done

Timing instrumentation (added for evaluation)

To isolate the performance of derivatives() (not masked by the main simulation),I temporarily added the following OpenMP-guarded timing block:

diff --git a/raster/r.sim/simlib/derivatives.c b/raster/r.sim/simlib/derivatives.c
--- a/raster/r.sim/simlib/derivatives.c
+++ b/raster/r.sim/simlib/derivatives.c
@@ -23,6 +23,11 @@
 H = (geometry->stepx / geometry->conv) * 8.0;
 V = (geometry->stepy / geometry->conv) * 8.0;

+#if defined(_OPENMP)
+    G_debug(1, "OpenMP threads (derivatives): %d", omp_get_max_threads());
+    double t0 = omp_get_wtime();
+#endif

 #if defined(_OPENMP)
 #pragma omp parallel for default(none) schedule(static) \
@@ -101,4 +109,8 @@
     }

+#if defined(_OPENMP)
+    G_debug(1, "dx/dy derivatives time: %.4f s", omp_get_wtime() - t0);
+#endif
 }

It used for benchmarking and does not affect numerical behavior.

@github-actions github-actions bot added raster Related to raster data processing C Related code is in C module labels Feb 16, 2026
@wenzeslaus
Copy link
Member

Can you please add the diff into your description to add the time measuring code?

@Abhi-d-gr8
Copy link
Contributor Author

Hi @wenzeslaus !
I have added the diff in the description (at the end), to calculate the time measuring code of the derivatives.c

@wenzeslaus
Copy link
Member

Awesome! I'm getting good results locally. Let's finalize it! Some valgrind-like scrutinizing would be good. There is some discussion in issues or PRs about how to do that with threads in our context in relation to r.mapcalc parallel code.

image
for nprocs in 1 8 1 8 1 8 1 4 1 4 1 4 1 36 1 36 1 36 1 36; do   r.sim.water elevation=elevation rain_value=50 man_value=0.05       depth=depth_bench discharge=disch_bench       nwalkers=5000 niterations=0 random_seed=1 nprocs=$nprocs --o       2>&1 | grep -E "threads";   echo "---"; done
diff --git a/raster/r.sim/simlib/derivatives.c b/raster/r.sim/simlib/derivatives.c
index 1519055783..9df42503b7 100644
--- a/raster/r.sim/simlib/derivatives.c
+++ b/raster/r.sim/simlib/derivatives.c
@@ -22,6 +22,11 @@ void derivatives(const Geometry *geometry, float **elevation, double **dx,
     H = (geometry->stepx / geometry->conv) * 8.0;
     V = (geometry->stepy / geometry->conv) * 8.0;
 
+#if defined(_OPENMP)
+    //G_debug(1, "OpenMP threads (derivatives): %d", omp_get_max_threads());
+    double t0 = omp_get_wtime();
+#endif
+
 #if defined(_OPENMP)
 #pragma omp parallel for default(none) schedule(static) \
     shared(geometry, elevation, dx, dy, H, V)           \
@@ -101,4 +106,9 @@ void derivatives(const Geometry *geometry, float **elevation, double **dx,
             dy[row][col] = ((c7 + c8 + c8 + c9) - (c1 + c2 + c2 + c3)) / V;
         }
     }
+
+#if defined(_OPENMP)
+    G_debug(1, "threads=%d, time=%.4f", omp_get_max_threads(), omp_get_wtime() - t0);
+#endif
+
 }
# generated

import pandas as pd
import matplotlib.pyplot as plt

# Raw data
data = [
    (1, 0.0537), (8, 0.0164),
    (1, 0.0540), (8, 0.0119),
    (1, 0.0546), (8, 0.0113),
    (1, 0.0557), (4, 0.0208),
    (1, 0.0538), (4, 0.0263),
    (1, 0.0534), (4, 0.0202),
    (1, 0.0557), (36, 0.0126),
    (1, 0.0525), (36, 0.0061),
    (1, 0.0536), (36, 0.0092),
    (1, 0.0553), (36, 0.0104),
]

df = pd.DataFrame(data, columns=["threads", "time"])

# Aggregate statistics: mean and full range
stats = (
    df.groupby("threads")["time"]
    .agg(mean="mean", min="min", max="max")
    .reset_index()
)

# Compute asymmetric error bars (min/max range)
yerr = [
    stats["mean"] - stats["min"],
    stats["max"] - stats["mean"],
]

# Plot
plt.figure()
plt.errorbar(
    stats["threads"],
    stats["mean"],
    yerr=yerr,
    fmt="o-"
)
plt.xlabel("Number of threads")
plt.ylabel("Time (s)")
plt.title("Threads vs Time with Full Data Range (min–max)")
plt.grid(True)

plt.show()

@@ -1,4 +1,6 @@
#include "simlib.h"
#include <grass/gis.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed? (It is for debug, but here without it...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is not needed , I was using different approaches so had included it earlier but yes its not required.

@wenzeslaus
Copy link
Member

Another thing, do the current tests actually cover the two cases for dx-dy and for enabled/disabled parallelism?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C module raster Related to raster data processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants