out_stackdriver: fix batch drop on invalid labels by erain · Pull Request #11539 · fluent/fluent-bit

erain · 2026-03-11T19:12:59Z

Description

When a single record contains a logging.googleapis.com/labels field that is not a map, the out_stackdriver plugin currently drops the entire batch. This causes data loss for all other valid records in that batch.

Root Cause

In stackdriver_format(), the labels type check had two problems:

It was checked after operation, sourceLocation, and httpRequest extraction, requiring complex cleanup on the error path (which was also missing destroy_http_request(), causing a memory leak).
The same drop decision was duplicated across the prescan loop and main packing loop, creating fragile, divergent validation paths.

Fix

Extracted a shared should_skip_record() helper that validates both insertId and labels in one place:

static int should_skip_record(struct flb_stackdriver *ctx,
                              msgpack_object *obj,
                              int log_errors)

This helper is now used:

In the prescan loop (silent, log_errors=FLB_FALSE) to correctly compute array_size (entries count).
At the top of the main packing loop (log_errors=FLB_TRUE), before any field extraction, so invalid records are skipped cleanly with no cleanup needed.

Scope

This PR fixes only the labels not a map per-record drop case.

The following batch-level drop scenarios remain unchanged as they affect shared batch-level state by design:

flb_log_event_decoder_init() failure
flb_log_event_decoder_next() failure
k8s local_resource_id extraction/processing failures (k8s_container, k8s_node, k8s_pod)
process_local_resource_id() failures
Final msgpack-to-JSON serialization failure

Testing

New Tests (7 total)

Test	Scenario
`labels_not_a_map`	Single record with labels as string → no output
`labels_not_a_map_custom_key`	Non-default `labels_key` with string value → no output
`labels_not_a_map_with_extracted_fields`	Record with invalid labels + httpRequest, operation, sourceLocation, trace, spanId → exercises skip path with extracted fields present
`batch_labels_not_a_map`	3-record batch (valid/invalid/valid) → 2 entries, bad record dropped
`batch_first_record_labels_not_a_map`	First record invalid, second valid → only valid record emitted
`batch_all_records_labels_not_a_map`	All records have invalid labels → no output
`batch_mixed_errors`	4-record batch mixing invalid insertId + invalid labels + valid → only valid records emitted

Batch tests with content assertions (batch_labels_not_a_map, batch_first_record_labels_not_a_map, batch_mixed_errors) use tail/file-backed fixtures to guarantee single-batch semantics. The batch_all_records_labels_not_a_map test uses a fixture file as well but only asserts no output.

Regression Tests (all pass)

insertId_common_case, empty_insertId, insertId_incorrect_type_int, default_labels, custom_labels, config_labels_conflict, config_labels_no_conflict, default_labels_k8s_resource_type, custom_labels_k8s_resource_type, severity_multi_entries

Summary by CodeRabbit

Bug Fixes
- Improved entry validation: malformed labels or invalid insertId are now detected earlier and dropped cleanly; processing continues for valid entries and a clear warning is emitted when an entire batch is skipped.
Tests
- Added extensive tests covering single-record and batch scenarios with various invalid label configurations to prevent regressions.

coderabbitai · 2026-03-11T19:13:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2da3f756-8b77-49d0-b04d-245845565ac9

📥 Commits

Reviewing files that changed from the base of the PR and between 7928a60 and faa836c.

⛔ Files ignored due to path filters (4)

tests/runtime/data/stackdriver/stackdriver_batch_all_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_first_record_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_mixed_errors.log is excluded by !**/*.log

📒 Files selected for processing (3)

plugins/out_stackdriver/stackdriver.c
tests/runtime/data/stackdriver/stackdriver_test_labels.h
tests/runtime/out_stackdriver.c

📝 Walkthrough

Walkthrough

This PR centralizes record validation in the Stackdriver plugin by adding a should_skip_record helper to validate insertId and payload labels type, replacing inlined checks; it also adds test data macros and seven new tests covering invalid-labels and mixed-batch scenarios.

Changes

Cohort / File(s)	Summary
Stackdriver Plugin Validation Refactor `plugins/out_stackdriver/stackdriver.c`	Adds `should_skip_record(ctx, obj, log_errors)` to centralize insertId and payload `labels` validation. Replaces per-record inline checks during prescan and entry processing, logs errors conditionally, and adds a batch-level warning when all entries are skipped.
Test Data: invalid labels `tests/runtime/data/stackdriver/stackdriver_test_labels.h`	Adds seven test macros representing invalid-`labels` scenarios (labels as non-map, custom key variants, records with extracted fields, and mixed/fully-invalid batch variants).
Runtime tests: Stackdriver `tests/runtime/out_stackdriver.c`	Adds seven new exported test functions and corresponding `TEST_LIST` entries to exercise invalid-labels handling and mixed-batch behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Issue #11541: Matches the centralization of per-record skipping via should_skip_record, converting batch-fatal invalid-labels behavior to per-record skipping.

Suggested reviewers

braydonk

Poem

🐇 I hop through records, sniffing each line,
If labels are wrong, I skip them just fine,
insertId checked, no more batch-wide fright,
Tests bloom like carrots in morning light,
Hooray — validation neat and bright! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main fix: addressing batch dropping when invalid labels are encountered in the stackdriver plugin.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan for PR comments

Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/out_stackdriver/stackdriver.c (1)
1927-1929: ⚠️ Potential issue | 🟡 Minor

Log a summary reason before returning on all-invalid batches.

If prescan rejects every record, the function returns before the logged pass, so there is no message explaining that all entries were skipped for invalid insertId or labels. Emitting one summary warning here would make this failure mode diagnosable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_stackdriver/stackdriver.c` around lines 1927 - 1929, When prescan
leaves no valid records (array_size == 0) add a summary warning log immediately
before the early return to explain why the batch is dropped (e.g., "all entries
skipped: invalid insertId or labels"); use the same logging facility already
used in this file (the local logger used elsewhere in stackdriver.c) and
reference the prescan result/array_size check so the message is emitted whenever
array_size == 0 instead of returning silently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1917-1923: The loop that decrements array_size when
should_skip_record() returns true adjusts the emitted count locally but
cb_stackdriver_flush() still reports event_chunk->total_events; update the code
to propagate the actual emitted/processed count to cb_stackdriver_flush() (e.g.,
compute emitted_count by starting from event_chunk->total_events and
decrementing for each skipped record inside the
flb_log_event_decoder_next()/should_skip_record() loop or accumulate
emitted_count directly) and change metric reporting in cb_stackdriver_flush() to
use this emitted_count (or a new parameter name you add) instead of
event_chunk->total_events so dropped records are reflected in flush and
dropped-record metrics.

---

Outside diff comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1927-1929: When prescan leaves no valid records (array_size == 0)
add a summary warning log immediately before the early return to explain why the
batch is dropped (e.g., "all entries skipped: invalid insertId or labels"); use
the same logging facility already used in this file (the local logger used
elsewhere in stackdriver.c) and reference the prescan result/array_size check so
the message is emitted whenever array_size == 0 instead of returning silently.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d8e76ba-c3cc-4894-b540-33bba85548b2

📥 Commits

Reviewing files that changed from the base of the PR and between a1d9c2a and 7928a60.

⛔ Files ignored due to path filters (4)

tests/runtime/data/stackdriver/stackdriver_batch_all_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_first_record_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_labels_not_a_map.log is excluded by !**/*.log
tests/runtime/data/stackdriver/stackdriver_batch_mixed_errors.log is excluded by !**/*.log

📒 Files selected for processing (3)

plugins/out_stackdriver/stackdriver.c
tests/runtime/data/stackdriver/stackdriver_test_labels.h
tests/runtime/out_stackdriver.c

coderabbitai · 2026-03-11T22:08:24Z

plugins/out_stackdriver/stackdriver.c

    while ((ret = flb_log_event_decoder_next(
                    &log_decoder,
                    &log_event)) == FLB_EVENT_DECODER_SUCCESS) {
-        /* Extract insertId */
-        in_status = validate_insert_id(&insert_id_obj, log_event.body);
-
-        if (in_status == INSERTID_INVALID) {
-            flb_plg_error(ctx->ins,
-                          "Incorrect insertId received. InsertId should be non-empty string.");
+        if (should_skip_record(ctx, log_event.body, FLB_FALSE) == FLB_TRUE) {
            array_size -= 1;
        }
    }


⚠️ Potential issue | 🟠 Major

Propagate skipped-record count to the flush metrics.

array_size is reduced here, but the flush path still reports event_chunk->total_events as processed. After this change, a batch with locally skipped records will be counted as fully successful on HTTP 200, and dropped-record metrics will miss the records discarded by should_skip_record(). Please surface the emitted count to cb_stackdriver_flush() and use that for metrics.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/out_stackdriver/stackdriver.c` around lines 1917 - 1923, The loop that decrements array_size when should_skip_record() returns true adjusts the emitted count locally but cb_stackdriver_flush() still reports event_chunk->total_events; update the code to propagate the actual emitted/processed count to cb_stackdriver_flush() (e.g., compute emitted_count by starting from event_chunk->total_events and decrementing for each skipped record inside the flb_log_event_decoder_next()/should_skip_record() loop or accumulate emitted_count directly) and change metric reporting in cb_stackdriver_flush() to use this emitted_count (or a new parameter name you add) instead of event_chunk->total_events so dropped records are reflected in flush and dropped-record metrics.

When a single record contains a logging.googleapis.com/labels field that is not a map, the entire batch is dropped. This causes data loss for all other valid records in the batch. Extract a shared should_skip_record() helper that validates both insertId and labels fields. Use it in the prescan loop (for array sizing) and at the top of the main packing loop (before any field extraction), so invalid records are skipped cleanly without complex cleanup paths. This change only addresses the labels-not-a-map case. Other batch- level drop scenarios (k8s local_resource_id failures, decoder init failures, JSON serialization failures) remain unchanged as they affect shared batch-level state by design. Signed-off-by: Yu Yi <yiyu@yiyu.me>

erain force-pushed the fix/stackdriver-record-drop branch 3 times, most recently from fcf50e3 to 7928a60 Compare March 11, 2026 20:54

erain mentioned this pull request Mar 11, 2026

Roadmap: complete out_stackdriver record-drop error handling #11541

Open

7 tasks

erain marked this pull request as ready for review March 11, 2026 22:02

erain requested a review from braydonk as a code owner March 11, 2026 22:02

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

erain force-pushed the fix/stackdriver-record-drop branch from 7928a60 to faa836c Compare March 12, 2026 14:10

erain temporarily deployed to pr March 12, 2026 14:10 — with GitHub Actions Inactive

erain temporarily deployed to pr March 12, 2026 14:31 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out_stackdriver: fix batch drop on invalid labels#11539

out_stackdriver: fix batch drop on invalid labels#11539
erain wants to merge 1 commit intofluent:masterfrom
erain:fix/stackdriver-record-drop

erain commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erain commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

Fix

Scope

Testing

New Tests (7 total)

Regression Tests (all pass)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erain commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading