Skip to content

Conversation

@arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Dec 17, 2025

This change adds a new tiered buffer pool that uses power-of-2 tier sizes. It reduces the lookup time for the relevant sizedBufferPool from $O(\log n)$ to $O(1)$, where n is the number of tiers. This creates constant-time lookups independent of the tier count, allowing users to add more tiers without performance overhead.

Benchmarks

Micro-benchmark that measures only the pool query performance, ignoring the allocation time:

func BenchmarkSearch(b *testing.B) {
	defaultBufferPoolSizes := make([]int, len(defaultBufferPoolSizeExponents))
	for i, exp := range defaultBufferPoolSizeExponents {
		defaultBufferPoolSizes[i] = 1 << exp
	}
	b.Run("pool=Tiered", func(b *testing.B) {
		p := NewTieredBufferPool(defaultBufferPoolSizes...).(*tieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				// One for get, one for put.
				_ = p.getPool(size)
				_ = p.getPool(size)
			}
		}
	})

	b.Run("pool=BinaryTiered", func(b *testing.B) {
		p := NewBinaryTieredBufferPool(defaultBufferPoolSizeExponents...).(*binaryTieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				_ = p.poolForGet(size)
				_ = p.poolForPut(size)
			}
		}
	})
}

With 5 tiers:

go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.353m ± 2%   2.036m ± 0%  -61.97% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

With 9 tiers:

go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.659m ± 0%   2.035m ± 0%  -64.04% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

RELEASE NOTES:

  • mem: Add faster tiered buffer pool. Use NewBinaryTieredBufferPool to create such pools.

@arjan-bal arjan-bal added this to the 1.79 Release milestone Dec 17, 2025
@arjan-bal arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 17, 2025
@arjan-bal arjan-bal requested review from dfawley and easwars December 17, 2025 10:25
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.25%. Comparing base (81a00ce) to head (f32f229).
⚠️ Report is 70 commits behind head on master.

Files with missing lines Patch % Lines
mem/buffer_pool.go 92.15% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8775      +/-   ##
==========================================
+ Coverage   83.22%   83.25%   +0.02%     
==========================================
  Files         418      414       -4     
  Lines       32385    32801     +416     
==========================================
+ Hits        26952    27308     +356     
- Misses       4050     4077      +27     
- Partials     1383     1416      +33     
Files with missing lines Coverage Δ
mem/buffer_pool.go 94.49% <92.15%> (-2.12%) ⬇️

... and 66 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from bdccc9b to 1f26e66 Compare December 17, 2025 12:28
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided to make it possible for the user to set the default buffer pool for the whole process through an experimental API, and get rid of the existing dial option and the server option. Was your plan to do that in a follow-up?


// Determine the maximum exponent we need to support.
// bits.Len64(math.MaxUint64) is 63.
const maxExponent = 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume a 64-bit machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for machines up to 64 bits, i.e. both 32 and 64 bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a value of 63 is passed to this function, capSize := 1 << exp will evaluate to a negative number on 64 bit machines, since capSize is a signed integer.

Also, how would these bit shift operators (where the exponent is greater than 32 bits) return expected values on 32 bit machines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have some tests for the edge cases for both 32 and 64 bit machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've updated the implementation to use bits.UintSize to determine the machine's word size and set that as the maximum tier size. I also added tests that mock bits.UintSize to simulate both 32-bit and 64-bit environments.

Comment on lines 149 to 151
if exp > maxExponent || exp < 0 {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we silently ignoring certain exponents that don't meet our criteria? Should we error out instead?

Copy link
Contributor Author

@arjan-bal arjan-bal Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining the reason to ignore the values.

  1. For values greater than 63, it's not possible to allocate such large slices. So ignoring them will not cause observable changes. I changed the param type to uint8 to help give linter errors if people pass constant values > 256.
  2. For values < 1, the buffer sizes would be fractional, e.g. $2^{-1}=0.5$. Since slices are of integer sizes, a sized buffer pool with a fractional size will never get used. I changed the param type to uint, making it impossible to pass negative values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to return an error in such cases instead of ignoring certain exponents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return an error from the constructor.

@easwars easwars assigned arjan-bal and unassigned easwars and dfawley Dec 17, 2025
@arjan-bal
Copy link
Contributor Author

I thought we decided to make it possible for the user to set the default buffer pool for the whole process through an experimental API, and get rid of the existing dial option and the server option. Was your plan to do that in a follow-up?

I was waiting for the author of #8770 to raise a PR for exposing a function to set the default buffer pool. See the second part of #8770 (comment).

get rid of the existing dial option and the server option

I'm not sure if we want to do this. Maybe people want to use different buffer pools for each channel.


In this PR, I'm just improving the existing buffer pool implementation.

@arjan-bal arjan-bal assigned easwars and unassigned arjan-bal Dec 18, 2025
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from b50277e to 8e7ecc4 Compare December 18, 2025 19:51
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 8e7ecc4 to 36d34f7 Compare December 18, 2025 19:54
@easwars
Copy link
Contributor

easwars commented Jan 20, 2026

Also, would it make sense to add some of the micro benchmarks that you used as part of this PR? Thanks.

@easwars easwars assigned arjan-bal and unassigned easwars Jan 20, 2026
Comment on lines 130 to 132
// NewBinaryTieredBufferPool returns a BufferPool implementation that uses
// multiple underlying pools of the given pool sizes. The pool sizes must be
// powers of 2. This enables O(1) lookup when getting or putting a buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs some updating I guess given that we accept exponents as arguments (and the next paragraph also says so).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


// Determine the maximum exponent we need to support.
// bits.Len64(math.MaxUint64) is 63.
const maxExponent = 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a value of 63 is passed to this function, capSize := 1 << exp will evaluate to a negative number on 64 bit machines, since capSize is a signed integer.

Also, how would these bit shift operators (where the exponent is greater than 32 bits) return expected values on 32 bit machines?


// Determine the maximum exponent we need to support.
// bits.Len64(math.MaxUint64) is 63.
const maxExponent = 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have some tests for the edge cases for both 32 and 64 bit machines.

Comment on lines 149 to 151
if exp > maxExponent || exp < 0 {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to return an error in such cases instead of ignoring certain exponents?

@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 3f63cea to e467b97 Compare February 3, 2026 11:15
@arjan-bal arjan-bal assigned easwars and unassigned arjan-bal Feb 3, 2026
@arjan-bal
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new binaryTieredBufferPool which provides a significant performance improvement for buffer pool lookups by using power-of-2 tier sizes for O(1) lookups. The implementation is clever, leveraging math/bits for efficient calculations. The accompanying tests are thorough, including architecture-specific checks and benchmarks that demonstrate the performance gains. I've found a minor issue in one of the new benchmark tests that could lead to a panic. Overall, this is an excellent contribution that improves performance and is well-implemented.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 1502569 to f32f229 Compare February 3, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Type: Performance Performance improvements (CPU, network, memory, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants