Skip to content

upb: add ASLR-based seed to integer hash function#26907

Open
MindflareX wants to merge 1 commit intoprotocolbuffers:mainfrom
MindflareX:fix/upb-inthash-seed
Open

upb: add ASLR-based seed to integer hash function#26907
MindflareX wants to merge 1 commit intoprotocolbuffers:mainfrom
MindflareX:fix/upb-inthash-seed

Conversation

@MindflareX
Copy link
Copy Markdown

Summary

The integer hash function upb_inthash() in upb/hash/common.c is completely deterministic — it uses no seed or randomization. In contrast, the string hash function already uses an ASLR-based seed (_upb_seed) via Wyhash.

This asymmetry allows an attacker to trivially precompute integer keys that all hash to the same bucket in upb_inttable, causing O(N²) insertion time for N entries. Any protobuf message with a map<int32, ...>, map<int64, ...>, map<uint32, ...>, or map<uint64, ...> field parsed from untrusted input is affected.

Impact

All UPB-backed protobuf runtimes are affected: Python, Ruby, PHP, and Rust.

Empirical measurements (Python 3.13, protobuf 6.33.6, x86-64):

Map entries Colliding keys Normal keys Ratio
1,000 0.001s 0.000s 13×
5,000 0.023s 0.000s 57×
10,000 0.097s 0.001s 164×
50,000 2.3s 0.002s 1,078×

A ~500 KB protobuf message with 50,000 colliding keys takes over 1,000× longer to parse than the same message with non-colliding keys. The scaling is quadratic — 200,000 entries would take ~37 seconds.

Collision construction

For map<int32, ...> on 64-bit, the hash is just (uint32_t)key (high 32 bits are zero). The bucket is hash & mask where mask = table_size - 1. Keys that are multiples of the final table size (a power of 2) all hash to bucket 0 at every intermediate table size:

step = next_power_of_2(n_entries)
colliding_keys = [i * step for i in range(n_entries)]

Prior art

  • String-keyed maps already use an ASLR-based seed via _upb_seed (line 448, 455)
  • The string hash seed was briefly reverted to a hardcoded constant (commit 6bde8c417, Feb 2025) due to Ruby test flakiness, then restored (commit 8ef81fbd9)
  • C++ protobuf uses absl::Hash with per-allocation randomized salt — not affected
  • Java uses LinkedHashMap with tree fallback at 8 collisions — partially mitigated

Fix

Added a separate ASLR-based seed variable (_upb_int_seed) and XOR it into the key before hashing, matching the approach already used for string-keyed maps. This makes the hash function non-deterministic across process invocations when ASLR is enabled.

Files changed

  • upb/hash/common.c — source
  • ruby/ext/google/protobuf_c/ruby-upb.c — Ruby amalgamation
  • php/ext/google/protobuf/php-upb.c — PHP amalgamation

@MindflareX MindflareX requested review from a team as code owners April 15, 2026 09:03
@MindflareX MindflareX requested review from JasonLunn and bshaffer and removed request for a team April 15, 2026 09:03
@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 15, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@MindflareX
Copy link
Copy Markdown
Author

@googlebot I signed it!

@MindflareX MindflareX force-pushed the fix/upb-inthash-seed branch from 0a58f23 to 991a6f1 Compare April 15, 2026 09:05
@MindflareX
Copy link
Copy Markdown
Author

I have already signed the CLA. My previous PRs (#26835, #26851) on this repo passed the CLA check with this same account. Please re-run the CLA check.

The integer hash function `upb_inthash()` is completely deterministic
(no seed), unlike the string hash function which uses an ASLR-based seed
via `_upb_seed`. This allows an attacker to trivially precompute keys
that all hash to the same bucket, causing O(N^2) insertion time for N
map entries during protobuf parsing.

For example, a ~500KB protobuf message with 50,000 colliding int32 map
keys takes ~1000x longer to parse than the same number of non-colliding
keys. This affects all UPB-backed runtimes: Python, Ruby, PHP, and Rust.

The fix adds a separate ASLR-based seed (`_upb_int_seed`) that is XORed
into the key before hashing, matching the approach already used for
string-keyed maps. This makes the hash function non-deterministic across
process invocations when ASLR is enabled.
@MindflareX MindflareX force-pushed the fix/upb-inthash-seed branch from 991a6f1 to 0954699 Compare April 15, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant