-
Notifications
You must be signed in to change notification settings - Fork 268
Description
On high-throughput clusters, the current static dataSamplingRate configuration requires manual tuning and cannot respond to changing traffic conditions. Users must guess an appropriate sampling rate, too aggressive causes unnecessary data loss during quiet periods, too permissive causes buffer overflow during traffic spikes.
This current static approach cannot adapt to load changes, and when retina_lost_events_total starts climbing, there is no automatic mechanism to reduce event volume.
Describe the solution you'd like
related to the feature ask #1966
Implement adaptive sampling using BPF ring buffer back-pressure (requires kernel 5.8+). With ring buffers, bpf_ringbuf_reserve() returns NULL when the buffer is full, providing natural back-pressure without explicit sampling logic.
This approach provides:
- Zero overhead when buffer has capacity, no random number generation or map lookups per packet
- Automatic adaptation drops events only when buffer is actually full
- Configurable capacity users tune buffer size rather than sampling rate
- Predictable behavior buffer size directly controls memory usage and burst capacity
This should be implemented alongside the BPF ring buffer feature request, as it depends on BPF_MAP_TYPE_RINGBUF (kernel 5.8+).
Describe alternatives you've considered
-
BPF map-based rate control - Userspace monitors load and writes sampling rate to a BPF map that the BPF program reads per-packet. Adds map lookup overhead and has feedback delay between userspace detection and BPF adjustment.
-
Token bucket in BPF - Implement rate limiting entirely in BPF using per-CPU maps. Complex to implement correctly with per-CPU state management and token refill logic.
Additional context
This feature is tied to the BPF ring buffer implementation. Ring buffers provide natural back-pressure that eliminates the need for explicit adaptive sampling logic. The buffer itself becomes the adaptation mechanism. Users configure buffer size based on their memory budget and acceptable burst capacity, and the system automatically drops events only when that capacity is exceeded.
Reference: https://nakryiko.com/posts/bpf-ringbuf/ "BPF ring buffer provides a special BPF_RB_NO_WAKEUP flag that can be used to avoid waking up user-space when buffer space is available, as well as BPF_RB_FORCE_WAKEUP to force wake-up."
Related to #655 as that kickstarted our internal investigation
Metadata
Metadata
Assignees
Labels
Type
Projects
Status