Skip to content

Conversation

@jenshannoschwalm
Copy link
Collaborator

@jenshannoschwalm jenshannoschwalm commented Jan 29, 2026

The OpenCL guided filter (used in blending and hazeremoval) requires 21 floats/pixel internally thus often leads to CPU fallbacks (overall requirements are >30 floats/pixel) while processing large images in HQ darkroom or while exporting.

This commit introduces internal tiling for improved performance. The overlap - thus the efficiacy of internal tiling - depends on the guided filter radius, still performance is much better.
We use the device specific advantage hint if provided.

With tiled OpenCL code we relax calculation of blending tiling requirements.

  • some formatting
  • updated kernels to access a specific part of the guide

BTW this work was inspired by the first proposed "benchmark" in https://discuss.pixls.us/t/dt-performance-analyzer-v0-6/55563/54

@jenshannoschwalm jenshannoschwalm added this to the 5.6 milestone Jan 29, 2026
@jenshannoschwalm jenshannoschwalm added scope: performance doing everything the same but faster OpenCL Related to darktable OpenCL code labels Jan 29, 2026
@jenshannoschwalm
Copy link
Collaborator Author

Force-pushed updated version (@ralfbrown spotted) and updated main text

@jenshannoschwalm
Copy link
Collaborator Author

Release note: Increased performance for OpenCL guided filter by internal tiling.

The OpenCL guided filter (used in blending and hazeremoval) requires 18 floats/pixel internally
thus often leads to CPU fallbacks (overall requirements are ~30 floats/pixel) while processing
large images in HQ darkroom or while exporting.

This commit introduces internal tiling for improved performance.
The overlap - thus the efficiacy of internal tiling - depends on the guided filter radius, still
performance is much better.
We use the device specific advantage hint if provided.

With tiled OpenCL code we relax calculation of blending tiling requirements.

- some formatting
- updated kernels to access a specific part of the guide
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OpenCL Related to darktable OpenCL code scope: performance doing everything the same but faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants