csharp: cap pre-@type token buffer in JsonParser.MergeAny to prevent O(N^2) memory growth#26851
Open
MindflareX wants to merge 1 commit intoprotocolbuffers:mainfrom
Open
csharp: cap pre-@type token buffer in JsonParser.MergeAny to prevent O(N^2) memory growth#26851MindflareX wants to merge 1 commit intoprotocolbuffers:mainfrom
MindflareX wants to merge 1 commit intoprotocolbuffers:mainfrom
Conversation
JsonParser.MergeAny scans an Any object's body for the "@type" property by consuming and recording tokens into a per-call List<JsonToken> until "@type" is reached. The recording is unbounded and includes the full content of any nested objects encountered before "@type", so an Any whose "value" subtree itself contains further "@type"-last Any objects forces every MergeAny stack frame to buffer the unread remainder of its body before recursing. The combined cost across N nested layers grows as O(N^2), allowing a small input (a few hundred KB) to allocate gigabytes of GC memory and exhaust the parser well before any recursion-depth limit is reached. Callers that legitimately raise JsonParser.Settings.RecursionLimit remain affected by this regardless of any other limit checks. Bound the per-call buffer to 100 tokens. The protobuf JSON canonical form places "@type" first; this cap is generous enough for any well-formed Any body in either ordering, while preventing the quadratic growth. Includes a regression test that constructs a 200-deep "@type"-last Any payload and asserts it is rejected with an InvalidProtocolBufferException mentioning "@type".
Contributor
|
I don't have the time to review this at this time. |
Author
|
Understood, thanks. It's independent of #26835, so no ordering needed — feel free to come back to it (or hand it off) whenever. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
JsonParser.MergeAnyscans anAnyobject's body for the@typeproperty by consuming and recording tokens into a per-callList<JsonToken>until@typeis reached. The recording is unbounded and includes the full content of any nested objects encountered before@type, so anAnywhosevaluesubtree itself contains further@type-lastAnyobjects forces everyMergeAnystack frame to buffer the unread remainder of its body before recursing.The combined cost across N nested layers grows as O(N²): a small input (a few hundred KB) can allocate gigabytes of GC memory and exhaust the parser well before any recursion-depth limit is reached. Callers that legitimately raise
JsonParser.Settings.RecursionLimitremain affected regardless of any other limit checks.Empirical reproduction
Building a payload of the form
{"value":{"value":{...{}...,"@type":".../Any"},...},"@type":".../Any"}(i.e.valuefirst,@typelast at every layer) against the unpatched parser:Doubling the depth quadruples the allocation, confirming the quadratic shape. The amplification is independent of the recursion-depth limit and is reachable regardless of the
RecursionLimitvalue.Fix
Bound the per-call token buffer to 100 tokens. The protobuf JSON canonical form places
@typefirst (1 token recorded); even in the@type-last form this cap allows roughly 50 scalar fields ahead of@type, which is more than sufficient for any well-formedAnybody. When the cap is exceeded, the parser throwsInvalidProtocolBufferExceptionwith a clear message pointing the user at the@typeplacement.The cap is intentionally small and constant (rather than scaled to
RecursionLimit) so that the worst-case per-call buffer is bounded irrespective of how callers configure their settings.Test plan
Any_TypeUrlBufferLimit(200-deep@type-last payload, asserts rejection with a message mentioning@type)JsonParserTestcases pass (332 existing + 1 new)Any_RegularMessageandAny_WellKnownTypetests, which use@type-last payloads with 5-8 pre-@typetokens, continue to passnetstandard1.1,netstandard2.0,net45,net50Related
This is a separate root cause from the recursion-depth bypass addressed in #26835 and applies independently of that fix. After #26835 lands, the default
RecursionLimit=100happens to bound the worst-case memory amplification to ~1 MB, but any caller that raises the limit (e.g. for legitimately deep typed-Anypayloads) remains exposed to the full quadratic growth without this change.