Skip to content

Conversation

@opsengine
Copy link

@opsengine opsengine commented Nov 19, 2025

This PR introduces optional sharding for client-side aggregation locks to significantly improve throughput in high-concurrency scenarios.

Problem:
The aggregator previously used a single sync.RWMutex for each metric type (counts, gauges, sets). Under high concurrency (many goroutines reporting metrics), this single lock became a major contention point, limiting throughput even when reporting unique metrics.

Solution:
We have introduced a sharding mechanism for these metric maps.

  • The number of shards is configurable via a new option: WithAggregatorShardCount(int).
  • The default shard count is 1, preserving the existing behavior and performance characteristics.
  • When shardCount > 1, metrics are distributed across shards based on a hash of their context (name + tags), reducing lock contention by a factor roughly equal to the shard count.

Performance:
Micro-benchmarks demonstrate significant improvements in high-contention scenarios (M4 Max, 14 threads):

High Contention (Concurrent updates to unique metrics):

  • 1 Shard: ~481 ns/op
  • 32 Shards: ~78 ns/op
    -> ~6x throughput improvement

Overhead (Single thread, no contention):

  • Legacy Implementation: ~31.46 ns/op
  • New Implementation (1 Shard): ~31.47 ns/op
    -> Zero overhead introduced for the default configuration.

The optimization ensures that the hashing cost is completely bypassed when sharding is disabled (default), guaranteeing no regression for existing users.

@opsengine opsengine force-pushed the shard-aggregator-lock branch from b82b0d1 to e495324 Compare November 19, 2025 23:42
@opsengine opsengine force-pushed the shard-aggregator-lock branch from e495324 to 989c74a Compare November 20, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant