Shard aggregator lock #343
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces optional sharding for client-side aggregation locks to significantly improve throughput in high-concurrency scenarios.
Problem:
The aggregator previously used a single
sync.RWMutexfor each metric type (counts,gauges,sets). Under high concurrency (many goroutines reporting metrics), this single lock became a major contention point, limiting throughput even when reporting unique metrics.Solution:
We have introduced a sharding mechanism for these metric maps.
WithAggregatorShardCount(int).1, preserving the existing behavior and performance characteristics.shardCount > 1, metrics are distributed across shards based on a hash of their context (name + tags), reducing lock contention by a factor roughly equal to the shard count.Performance:
Micro-benchmarks demonstrate significant improvements in high-contention scenarios (M4 Max, 14 threads):
High Contention (Concurrent updates to unique metrics):
-> ~6x throughput improvement
Overhead (Single thread, no contention):
-> Zero overhead introduced for the default configuration.
The optimization ensures that the hashing cost is completely bypassed when sharding is disabled (default), guaranteeing no regression for existing users.