Skip to content

Prevent urlencoded content foo=bar from being deserialized when serializer is set to JSON#1771

Open
icyavocado wants to merge 2 commits intoPerlDancer:mainfrom
icyavocado:pr/issue-1770-allow-urlencoded-data-in-serializer-json
Open

Prevent urlencoded content foo=bar from being deserialized when serializer is set to JSON#1771
icyavocado wants to merge 2 commits intoPerlDancer:mainfrom
icyavocado:pr/issue-1770-allow-urlencoded-data-in-serializer-json

Conversation

@icyavocado
Copy link
Contributor

This PR will allow sending requests with URL-encoded content, eg: foo=bar, to the Dancer2 route, even when the serializer is set to JSON

Test added to support this new behavior.

Resolve ##1770

- set serializer to JSON, and turn on canonical for a more reliable test
- passing conent `foo=bar&bar=` with application/x-www-form-urlencoded
- return body_parameters and expect the data returns JSON
- the body contents shouldn't look like JSON and contains an `=`
- header should be set to application/x-www-form-urlencoded
@icyavocado icyavocado changed the title Allow urlencoded content foo=bar from being deserialized when serializer is set to JSON Prevent urlencoded content foo=bar from being deserialized when serializer is set to JSON Mar 2, 2026
@xsawyerx
Copy link
Member

xsawyerx commented Mar 2, 2026

When you set JSON serializer on, you're essentially saying "assume everything received should be decoded from JSON and everything sent should be encoded to JSON" but this says "well, except..." and I'm not sure that's a good idea.

Setting an automatic serializer is good for very restricted situations in which you have an app that shouldn't receive input (or send output) in any other format. We shouldn't be fitting it to various edge-cases.

However, @veryrusty is the expert on serializers. :)

@icyavocado
Copy link
Contributor Author

icyavocado commented Mar 3, 2026

I will wait for a suggestion from the team!

Here is my thought process:
I understand that when you set the serializer => "JSON", it would not accept any type of data that is not JSON, and the result is throwing an error.

In this example, the request declared that the data is not JSON, and in this case, using url encoded header. So, I am thinking Dancer2::Core::JSON should be flexible and not try to deserialize the raw data while preserving the serialization of the return data.

I also can also see something like this #1577 would work in term of having to maintain my own version of JSON that allowed the behaviour.

@toddr-bot
Copy link

PR Review — Prevent urlencoded content foo=bar from being deserialized when serializer is set to JSON

The PR addresses a real pain point (urlencoded form data failing when a JSON serializer is set), but the implementation has design concerns that align with the maintainer's feedback. The body-sniffing heuristic (!~ /^\s*[[{]/ and =~ /=/) is fragile — if the content type is application/x-www-form-urlencoded, that declaration alone should be sufficient to skip JSON deserialization without inspecting the body. The guard should also be moved after the serializer existence check to avoid unnecessary work. The test has an is_deeply buried inside a route handler where failures can be masked. Edge cases around the heuristic boundaries are untested. This needs revision before merging.


🟡 Important

1. Content-type check placed before serializer existence check (lib/Dancer2/Core/Request.pm, L193-202)
The new guard is placed before the my $serializer = $self->serializer or return; line. This means even when no serializer is configured at all (the common case for most Dancer2 apps), this code will execute the content-type regex, body checks, etc. on every request that calls deserialize(). It's wasted work. More importantly, the existing multipart/form-data guard at line 190 has the same issue — but adding a second one compounds it.

Consider moving the new check to after the serializer existence check (after the current line 199), so it only runs when a serializer is actually configured. This also makes the intent clearer: "we have a serializer, but we're choosing not to use it for this content type."

2. Body regex heuristic is fragile and bypassable (lib/Dancer2/Core/Request.pm, L197)
The check $self->body !~ /^\s*[[{]/ combined with $self->body =~ /=/ is a content-sniffing heuristic that tries to distinguish URL-encoded data from JSON. This has issues:

  1. False negatives: URL-encoded data can start with [ or { (e.g., {key}=value or [0]=value — common in PHP-style array notation). These would bypass the guard and be fed to the JSON deserializer, which will fail.
  2. False positives: A body like x=1 with Content-Type: application/x-www-form-urlencoded that is intentionally meant to be deserialized (unlikely but possible) would be silently skipped.
  3. Content-type should be authoritative: If Content-Type is application/x-www-form-urlencoded, that alone should be sufficient to skip JSON deserialization — no body sniffing needed. The content type is the declaration of format. The regex heuristics add complexity without clear benefit and could be removed, leaving just the content-type check.

At minimum, if keeping the heuristic, document why body sniffing is needed on top of the content-type check.

&& $self->body !~ /^\s*[[{]/   # Not JSON
&& $self->body =~ /=/    # has an '='

3. Test uses is_deeply inside route handler — failures are invisible (t/classes/Dancer2-Core-Request/serializers.t, L95)
The ::is_deeply(...) call on line 95 (inside the route handler) runs during request processing. If it fails, the TAP failure output goes to STDERR but the subtest may still pass because the outer is($res->content, ...) check at line 105 could independently succeed or the response status isn't validated. This means a body_parameters mismatch could go unnoticed.

Better approach: remove the is_deeply from inside the handler and instead make the handler simply return \%body_parameters. Then assert on the response content in the subtest (which you already do). Alternatively, check $res->code is 200 in the subtest to catch handler-level failures.

::is_deeply(\%body_parameters, { foo => 'bar', bar => '' }, 'Correct body parameters');

🟢 Suggestions

1. Redundant body check — body is checked again later (lib/Dancer2/Core/Request.pm, L195)
The condition checks $self->body (truthiness) at line 198, but the existing code at lines 209-212 already checks $body && length $body > 0 before attempting deserialization. If the body is empty/undef, deserialization would already be skipped. This makes the $self->body check in the new guard partially redundant (though it does serve to short-circuit the regex checks).

&& $self->body

2. Missing test cases for edge cases and the heuristic boundaries (t/classes/Dancer2-Core-Request/serializers.t, L92-108)
The test only covers the happy path (foo=bar&bar=). Given the body-sniffing heuristics in the implementation, these edge cases should also be tested:

  1. URL-encoded body that starts with { or [ (e.g., {foo}=bar) — does the heuristic incorrectly try to JSON-parse it?
  2. JSON body sent with application/x-www-form-urlencoded content type — is it correctly deserialized as JSON or skipped?
  3. URL-encoded body without Content-Type header — does it still hit the serializer?
  4. Request with Content-Type: application/json and normal JSON body — regression check that normal JSON deserialization still works (could be a separate subtest or verified existing tests cover this).

Checklist

  • No hardcoded secrets
  • Input validation at boundaries — warning incompatible splat #2 — body-sniffing heuristic is bypassable
  • No injection or shell interpolation risks
  • Test coverage for new branches — suggestion template dsl does not work #5 — edge cases untested
  • Test isolation
  • Tests validate behavior not implementation

Summary

The PR addresses a real pain point (urlencoded form data failing when a JSON serializer is set), but the implementation has design concerns that align with the maintainer's feedback. The body-sniffing heuristic (!~ /^\s*[[{]/ and =~ /=/) is fragile — if the content type is application/x-www-form-urlencoded, that declaration alone should be sufficient to skip JSON deserialization without inspecting the body. The guard should also be moved after the serializer existence check to avoid unnecessary work. The test has an is_deeply buried inside a route handler where failures can be masked. Edge cases around the heuristic boundaries are untested. This needs revision before merging.


Automated review by Kōan

@toddr-bot
Copy link

@xsawyerx: When you set JSON serializer on, you're essentially saying "assume everything received should be d...

Your instinct is right to push back here. The current implementation tries to have it both ways: "everything is JSON, except when it's not" — and it detects "not JSON" via fragile body-sniffing regexes.

If the team does decide to support this, a cleaner approach would be to only check Content-Type without sniffing the body. If a request declares Content-Type: application/x-www-form-urlencoded, the framework should respect that declaration and skip JSON deserialization — the content type is the contract. This avoids the heuristic fragility (bodies starting with [ or {, bodies without =, etc.).

That said, your broader point stands: enabling a global serializer means opting into strict format expectations. A more composable solution might be per-route serializer configuration (as referenced in PR #1577), which avoids special-casing content types in the core deserialization path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants