Skip to content

coro: Extend coro stack for SIMD operated yyjson#11586

Open
cosmo0920 wants to merge 1 commit intomasterfrom
cosmo0920-extend-coro-stack-for-simd-yyjson
Open

coro: Extend coro stack for SIMD operated yyjson#11586
cosmo0920 wants to merge 1 commit intomasterfrom
cosmo0920-extend-coro-stack-for-simd-yyjson

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Mar 19, 2026

Root Cause Analysis

Enabling SIMD causes the default JSON pack APIs (flb_pack_json() and flb_pack_json_recs()) to be routed through the extensible (_ext) path, which selects the YYJSON backend.

The YYJSON parser (yyjson_read_opts()) allocates a very large stack frame (~38 KiB) at function entry:

sub $0x9520, %rsp   // ~38 KiB
(gdb) disassemble /r yyjson_read_opts Dump of assembler code for function yyjson_read_opts:
 0x000055555592a6f0 <+0>: 55 push %rbp
 0x000055555592a6f1 <+1>: 48 89 e5 mov %rsp,%rbp
 0x000055555592a6f4 <+4>: 48 81 ec 20 95 00 00 sub $0x9520,%rsp
 => 0x000055555592a6fb <+11>: 48 89 bd e8 6f ff ff mov %rdi,-0x9018(%rbp)
 0x000055555592a702 <+18>: 48 89 b5 e0 6f ff ff mov %rsi,-0x9020(%rbp)
 0x000055555592a709 <+25>: 89 95 dc 6f ff ff mov %edx,-0x9024(%rbp)
 0x000055555592a70f <+31>: 48 89 8d d0 6f ff ff mov %rcx,-0x9030(%rbp)
 0x000055555592a716 <+38>: 4c 89 85 c8 6f ff ff mov %r8,-0x9038(%rbp)
 0x000055555592a71d <+45>: 48 8b 05 64 58 e2 00 mov 0xe25864(%rip),%rax # 0x55555674ff88  
 0x000055555592a724 <+52>: 48 8b 38 mov (%rax),%rdi

In the current execution context (libco coroutine stack), this exceeds the available stack space. As a result, the stack pointer (rsp) crosses into an unmapped/guard region immediately after the allocation. The crash occurs at the first local variable write:

mov %rdi, -0x9018(%rbp)

This happens before any JSON parsing logic executes, so the failure is not related to input data or parsing correctness.


Key Observations

  • Function arguments are passed correctly (verified via registers: rdi, rsi, rdx, etc.).

  • The crash occurs in the function prologue, not in parsing logic.

  • yyjson_read_opts() requires significantly more stack than the legacy pack path.

  • Increasing coroutine stack size (e.g., via STACK_FACTOR) did not resolve the issue, indicating:

    • The change may not affect this execution context, or
    • The effective available stack is still insufficient.

Why This Is a Regression

Before this change:

flb_pack_json → legacy path (small stack usage)

After this change (with SIMD enabled):

flb_pack_json → _ext → yyjson → ~38 KiB stack usage

This rerouting exposes existing callers (e.g., output plugins like Kinesis) to a backend whose stack requirements are incompatible with the coroutine stack constraints.


Conclusion

The issue is not caused by invalid inputs or API misuse, but by:

Routing the default JSON pack APIs to a backend (yyjson) that requires a large stack frame, which exceeds the limits of the libco coroutine stack.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Refactor
    • Optimized coroutine stack memory allocation to better utilize system resources and improve performance when SIMD support is available.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

A new SIMD_STACK_FACTOR preprocessor macro was introduced in the coroutine header file that conditionally evaluates to 2 when SIMD support is enabled and 1 otherwise. The default coroutine stack size calculation was updated to apply this factor, scaling stack allocation upward for SIMD-enabled builds.

Changes

Cohort / File(s) Summary
SIMD Stack Factor
include/fluent-bit/flb_coro.h
Introduced SIMD_STACK_FACTOR macro and updated FLB_CORO_STACK_SIZE_BYTE default computation to multiply by this factor in addition to STACK_FACTOR, enabling larger stack allocation when SIMD support is available.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A coroutine's stack grows wide,
With SIMD powers multiplied,
Two factors strong now work as one,
The vectoring work will swiftly run!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'coro: Extend coro stack for SIMD operated yyjson' accurately describes the main change: extending coroutine stack size to accommodate SIMD-enabled yyjson's larger stack requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-extend-coro-stack-for-simd-yyjson
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cosmo0920 cosmo0920 changed the title coro: Extend coro stack for SIM operated yyjson coro: Extend coro stack for SIMD operated yyjson Mar 19, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
include/fluent-bit/flb_coro.h (1)

80-84: Consider guarding manual FLB_CORO_STACK_SIZE against undersizing in SIMD builds.

If FLB_CORO_STACK_SIZE is defined, the SIMD multiplier path is skipped entirely. Adding a compile-time check (or at least a prominent warning comment) for a minimum safe size would prevent reintroducing the same crash via custom builds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/fluent-bit/flb_coro.h` around lines 80 - 84, When FLB_CORO_STACK_SIZE
is user-defined it bypasses the SIMD-aware default; add a compile-time guard
that compares FLB_CORO_STACK_SIZE against the computed safe minimum ((3 *
STACK_FACTOR * SIMD_STACK_FACTOR * PTHREAD_STACK_MIN) / 2) and emit a `#error` or
`#warning` if the user value is smaller, so update the FLB_CORO_STACK_SIZE /
FLB_CORO_STACK_SIZE_BYTE logic to validate FLB_CORO_STACK_SIZE using that
expression (referencing FLB_CORO_STACK_SIZE, FLB_CORO_STACK_SIZE_BYTE,
SIMD_STACK_FACTOR, STACK_FACTOR, PTHREAD_STACK_MIN) and fail/notify the build
when undersized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@include/fluent-bit/flb_coro.h`:
- Around line 80-84: When FLB_CORO_STACK_SIZE is user-defined it bypasses the
SIMD-aware default; add a compile-time guard that compares FLB_CORO_STACK_SIZE
against the computed safe minimum ((3 * STACK_FACTOR * SIMD_STACK_FACTOR *
PTHREAD_STACK_MIN) / 2) and emit a `#error` or `#warning` if the user value is
smaller, so update the FLB_CORO_STACK_SIZE / FLB_CORO_STACK_SIZE_BYTE logic to
validate FLB_CORO_STACK_SIZE using that expression (referencing
FLB_CORO_STACK_SIZE, FLB_CORO_STACK_SIZE_BYTE, SIMD_STACK_FACTOR, STACK_FACTOR,
PTHREAD_STACK_MIN) and fail/notify the build when undersized.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 19ad2cd5-d306-44c5-b55d-374bf85efcde

📥 Commits

Reviewing files that changed from the base of the PR and between d257e00 and 152deba.

📒 Files selected for processing (1)
  • include/fluent-bit/flb_coro.h

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 152debaa7b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

#define FLB_CORO_STACK_SIZE_BYTE FLB_CORO_STACK_SIZE
#else
#define FLB_CORO_STACK_SIZE_BYTE ((3 * STACK_FACTOR * PTHREAD_STACK_MIN) / 2)
#define FLB_CORO_STACK_SIZE_BYTE ((3 * STACK_FACTOR * SIMD_STACK_FACTOR * PTHREAD_STACK_MIN) / 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use an absolute minimum for SIMD coroutine stacks

On musl-based builds PTHREAD_STACK_MIN is 2048, so this formula produces only 6144 bytes when FLB_HAVE_SIMD is enabled. That is still far below the ~38 KiB yyjson_read_opts() frame called out in the commit message, and config->coro_stack_size is initialized from FLB_CORO_STACK_SIZE_BYTE in src/flb_config.c:394, so Alpine/musl packages will continue to hit the same coroutine overflow. Please clamp the SIMD path to a real byte floor instead of another PTHREAD_STACK_MIN multiple.

Useful? React with 👍 / 👎.

#define FLB_CORO_STACK_SIZE_BYTE FLB_CORO_STACK_SIZE
#else
#define FLB_CORO_STACK_SIZE_BYTE ((3 * STACK_FACTOR * PTHREAD_STACK_MIN) / 2)
#define FLB_CORO_STACK_SIZE_BYTE ((3 * STACK_FACTOR * SIMD_STACK_FACTOR * PTHREAD_STACK_MIN) / 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit the larger stack to yyjson callers

config->coro_stack_size is a global default that every co_create(config->coro_stack_size, ...) site consumes, including input collectors (include/fluent-bit/flb_input.h:666), scheduler timers (src/flb_scheduler.c:499), and output flush coroutines (include/fluent-bit/flb_output.h:1155). With this multiplier, any FLB_SIMD build now reserves 2x heap for every coroutine even when that coroutine never enters the yyjson pack path that motivated the fix, so deployments with many concurrent flush/retry coroutines take a measurable memory regression just by enabling SIMD.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant