perf: pool zstd encoders via sync.Pool and reduce allocations by jpkrohling · Pull Request #380 · splunk/stef

jpkrohling · 2026-02-23T18:05:42Z

Summary

Add sync.Pool for zstd encoders with WithEncoderConcurrency(1) to amortize encoder creation cost across writer instances
Add Close() method to FrameEncoder and all generated writers to return encoders to pool
Pre-allocate uvarintBuf to avoid allocation per frame restart
Fix constructor error paths to prevent leaking pooled encoders
Fix exporter Shutdown to flush/close writer under mutex with nil-guard in flusher

Benchmark

Repeated writer create/write/close cycle (the exporter pattern):

Metric	Before	After	Change
time/op	1059 µs	3.7 µs	-99.65%
B/op	19.1 MB	391 B	-100.00%
allocs/op	55	6	-89.09%

Test plan

All existing tests pass (go/pkg, go/otel, go/pdata, otelcol, all examples)
New BenchmarkFrameEncoderZstd validates pooling performance
Close() is idempotent (nil-guarded, safe to call multiple times)
Generated code matches templates (regenerated via stefc)
Verify no compression ratio regression with production data

Resolves E-1535

🤖 Generated with Claude Code

Add zstd encoder pooling to amortize encoder creation cost across writer instances, matching the pattern used by OTLP exporters. Changes: - Add sync.Pool for *zstd.Encoder with WithEncoderConcurrency(1) - Add Close() to FrameEncoder to return encoders to pool - Add Close() to generated writers (template + regenerated code) - Clear stale writer reference before returning encoder to pool - Fix constructor error paths to avoid leaking pooled encoders - Pre-allocate uvarintBuf to avoid allocation per frame restart - Fix exporter Shutdown to flush, close writer under mutex, and nil-guard flusher to prevent panic after shutdown Benchmark (repeated writer create/write/close, the exporter pattern): FrameEncoderZstd/pooled 1059µs → 3.7µs (-99.65%) B/op 19.1MB → 391B (-100.00%) allocs/op 55 → 6 (-89.09%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpkrohling · 2026-02-23T18:08:22Z

For context: I compared STEF with other options and the performance of STEF+zstd was quite bad in terms of CPU and memory. It seemed unreasonable to me, so I compared how this uses zstd with how we use in the Collector and found out that STEF isn't pooling the encoders. This change should bring the performance on par with OTLP gRPC/HTTP w/ zstd.

tigrannajaryan · 2026-02-23T23:47:43Z

@jpkrohling I would like to understand your use case better before we go ahead with this. STEF streams are long lasting and each stream creates only one zstd encoder per stream lifetime so the rate of zstd encoder creation should be very low. I don't see how this can have significant performance impact unless somehow you create and destroy a lot of STEF streams at a high rate. If that's what you are doing I want to understand why. The Collector exporter/receiver should not be doing that, the streams are supposed to be kept open for several minutes.

jpkrohling · 2026-02-24T13:55:58Z

Good to challenge that! I tried to isolate the STEF parts in my benchmark when comparing with OTLP, and that didn't reflect the actual behavior of the exporter. I refactored the benchmark so that it's closer to the reality, and the results are really great. I'll try to get this implemented in the next few days in an internal component here and report back.

In any case, I'd appreciate some feedback about the benchmark. Does it look sane to you? Am I missing something else?

https://github.com/ollygarden/thyme/blob/f2b5f54e0d5ccf20b2b37ab23ea91e65f66429da/tools/exportbench/bench_test.go#L190-L201

jpkrohling closed this Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf: pool zstd encoders via sync.Pool and reduce allocations#380

perf: pool zstd encoders via sync.Pool and reduce allocations#380
jpkrohling wants to merge 1 commit intosplunk:mainfrom
jpkrohling:jpkroehling/e-1535-optimize-stef-serialization-performance

jpkrohling commented Feb 23, 2026

Uh oh!

jpkrohling commented Feb 23, 2026

Uh oh!

tigrannajaryan commented Feb 23, 2026

Uh oh!

jpkrohling commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

jpkrohling commented Feb 23, 2026

Summary

Benchmark

Test plan

Uh oh!

jpkrohling commented Feb 23, 2026

Uh oh!

tigrannajaryan commented Feb 23, 2026

Uh oh!

jpkrohling commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants