Skip to content

out_stackdriver: fix batch drop on invalid labels#11539

Open
erain wants to merge 1 commit intofluent:masterfrom
erain:fix/stackdriver-record-drop
Open

out_stackdriver: fix batch drop on invalid labels#11539
erain wants to merge 1 commit intofluent:masterfrom
erain:fix/stackdriver-record-drop

Conversation

@erain
Copy link
Contributor

@erain erain commented Mar 11, 2026

Description

When a single record contains a logging.googleapis.com/labels field that is not a map, the out_stackdriver plugin currently drops the entire batch. This causes data loss for all other valid records in that batch.

Root Cause

In stackdriver_format(), the labels type check had two problems:

  1. It was checked after operation, sourceLocation, and httpRequest extraction, requiring complex cleanup on the error path (which was also missing destroy_http_request(), causing a memory leak).
  2. The same drop decision was duplicated across the prescan loop and main packing loop, creating fragile, divergent validation paths.

Fix

Extracted a shared should_skip_record() helper that validates both insertId and labels in one place:

static int should_skip_record(struct flb_stackdriver *ctx,
                              msgpack_object *obj,
                              int log_errors)

This helper is now used:

  • In the prescan loop (silent, log_errors=FLB_FALSE) to correctly compute array_size (entries count).
  • At the top of the main packing loop (log_errors=FLB_TRUE), before any field extraction, so invalid records are skipped cleanly with no cleanup needed.

Scope

This PR fixes only the labels not a map per-record drop case.

The following batch-level drop scenarios remain unchanged as they affect shared batch-level state by design:

  • flb_log_event_decoder_init() failure
  • flb_log_event_decoder_next() failure
  • k8s local_resource_id extraction/processing failures (k8s_container, k8s_node, k8s_pod)
  • process_local_resource_id() failures
  • Final msgpack-to-JSON serialization failure

Testing

New Tests (7 total)

Test Scenario
labels_not_a_map Single record with labels as string → no output
labels_not_a_map_custom_key Non-default labels_key with string value → no output
labels_not_a_map_with_extracted_fields Record with invalid labels + httpRequest, operation, sourceLocation, trace, spanId → exercises skip path with extracted fields present
batch_labels_not_a_map 3-record batch (valid/invalid/valid) → 2 entries, bad record dropped
batch_first_record_labels_not_a_map First record invalid, second valid → only valid record emitted
batch_all_records_labels_not_a_map All records have invalid labels → no output
batch_mixed_errors 4-record batch mixing invalid insertId + invalid labels + valid → only valid records emitted

Batch tests with content assertions (batch_labels_not_a_map, batch_first_record_labels_not_a_map, batch_mixed_errors) use tail/file-backed fixtures to guarantee single-batch semantics. The batch_all_records_labels_not_a_map test uses a fixture file as well but only asserts no output.

Regression Tests (all pass)

insertId_common_case, empty_insertId, insertId_incorrect_type_int, default_labels, custom_labels, config_labels_conflict, config_labels_no_conflict, default_labels_k8s_resource_type, custom_labels_k8s_resource_type, severity_multi_entries

Summary by CodeRabbit

  • Bug Fixes

    • Improved entry validation: malformed labels or invalid insertId are now detected earlier and dropped cleanly; processing continues for valid entries and a clear warning is emitted when an entire batch is skipped.
  • Tests

    • Added extensive tests covering single-record and batch scenarios with various invalid label configurations to prevent regressions.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2da3f756-8b77-49d0-b04d-245845565ac9

📥 Commits

Reviewing files that changed from the base of the PR and between 7928a60 and faa836c.

⛔ Files ignored due to path filters (4)
  • tests/runtime/data/stackdriver/stackdriver_batch_all_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_first_record_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_mixed_errors.log is excluded by !**/*.log
📒 Files selected for processing (3)
  • plugins/out_stackdriver/stackdriver.c
  • tests/runtime/data/stackdriver/stackdriver_test_labels.h
  • tests/runtime/out_stackdriver.c

📝 Walkthrough

Walkthrough

This PR centralizes record validation in the Stackdriver plugin by adding a should_skip_record helper to validate insertId and payload labels type, replacing inlined checks; it also adds test data macros and seven new tests covering invalid-labels and mixed-batch scenarios.

Changes

Cohort / File(s) Summary
Stackdriver Plugin Validation Refactor
plugins/out_stackdriver/stackdriver.c
Adds should_skip_record(ctx, obj, log_errors) to centralize insertId and payload labels validation. Replaces per-record inline checks during prescan and entry processing, logs errors conditionally, and adds a batch-level warning when all entries are skipped.
Test Data: invalid labels
tests/runtime/data/stackdriver/stackdriver_test_labels.h
Adds seven test macros representing invalid-labels scenarios (labels as non-map, custom key variants, records with extracted fields, and mixed/fully-invalid batch variants).
Runtime tests: Stackdriver
tests/runtime/out_stackdriver.c
Adds seven new exported test functions and corresponding TEST_LIST entries to exercise invalid-labels handling and mixed-batch behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • Issue #11541: Matches the centralization of per-record skipping via should_skip_record, converting batch-fatal invalid-labels behavior to per-record skipping.

Suggested reviewers

  • braydonk

Poem

🐇 I hop through records, sniffing each line,
If labels are wrong, I skip them just fine,
insertId checked, no more batch-wide fright,
Tests bloom like carrots in morning light,
Hooray — validation neat and bright! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main fix: addressing batch dropping when invalid labels are encountered in the stackdriver plugin.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan for PR comments
  • Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@erain erain force-pushed the fix/stackdriver-record-drop branch 3 times, most recently from fcf50e3 to 7928a60 Compare March 11, 2026 20:54
@erain erain marked this pull request as ready for review March 11, 2026 22:02
@erain erain requested a review from braydonk as a code owner March 11, 2026 22:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/out_stackdriver/stackdriver.c (1)

1927-1929: ⚠️ Potential issue | 🟡 Minor

Log a summary reason before returning on all-invalid batches.

If prescan rejects every record, the function returns before the logged pass, so there is no message explaining that all entries were skipped for invalid insertId or labels. Emitting one summary warning here would make this failure mode diagnosable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_stackdriver/stackdriver.c` around lines 1927 - 1929, When prescan
leaves no valid records (array_size == 0) add a summary warning log immediately
before the early return to explain why the batch is dropped (e.g., "all entries
skipped: invalid insertId or labels"); use the same logging facility already
used in this file (the local logger used elsewhere in stackdriver.c) and
reference the prescan result/array_size check so the message is emitted whenever
array_size == 0 instead of returning silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1917-1923: The loop that decrements array_size when
should_skip_record() returns true adjusts the emitted count locally but
cb_stackdriver_flush() still reports event_chunk->total_events; update the code
to propagate the actual emitted/processed count to cb_stackdriver_flush() (e.g.,
compute emitted_count by starting from event_chunk->total_events and
decrementing for each skipped record inside the
flb_log_event_decoder_next()/should_skip_record() loop or accumulate
emitted_count directly) and change metric reporting in cb_stackdriver_flush() to
use this emitted_count (or a new parameter name you add) instead of
event_chunk->total_events so dropped records are reflected in flush and
dropped-record metrics.

---

Outside diff comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1927-1929: When prescan leaves no valid records (array_size == 0)
add a summary warning log immediately before the early return to explain why the
batch is dropped (e.g., "all entries skipped: invalid insertId or labels"); use
the same logging facility already used in this file (the local logger used
elsewhere in stackdriver.c) and reference the prescan result/array_size check so
the message is emitted whenever array_size == 0 instead of returning silently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d8e76ba-c3cc-4894-b540-33bba85548b2

📥 Commits

Reviewing files that changed from the base of the PR and between a1d9c2a and 7928a60.

⛔ Files ignored due to path filters (4)
  • tests/runtime/data/stackdriver/stackdriver_batch_all_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_first_record_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_labels_not_a_map.log is excluded by !**/*.log
  • tests/runtime/data/stackdriver/stackdriver_batch_mixed_errors.log is excluded by !**/*.log
📒 Files selected for processing (3)
  • plugins/out_stackdriver/stackdriver.c
  • tests/runtime/data/stackdriver/stackdriver_test_labels.h
  • tests/runtime/out_stackdriver.c

Comment on lines 1917 to 1923
while ((ret = flb_log_event_decoder_next(
&log_decoder,
&log_event)) == FLB_EVENT_DECODER_SUCCESS) {
/* Extract insertId */
in_status = validate_insert_id(&insert_id_obj, log_event.body);

if (in_status == INSERTID_INVALID) {
flb_plg_error(ctx->ins,
"Incorrect insertId received. InsertId should be non-empty string.");
if (should_skip_record(ctx, log_event.body, FLB_FALSE) == FLB_TRUE) {
array_size -= 1;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate skipped-record count to the flush metrics.

array_size is reduced here, but the flush path still reports event_chunk->total_events as processed. After this change, a batch with locally skipped records will be counted as fully successful on HTTP 200, and dropped-record metrics will miss the records discarded by should_skip_record(). Please surface the emitted count to cb_stackdriver_flush() and use that for metrics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_stackdriver/stackdriver.c` around lines 1917 - 1923, The loop
that decrements array_size when should_skip_record() returns true adjusts the
emitted count locally but cb_stackdriver_flush() still reports
event_chunk->total_events; update the code to propagate the actual
emitted/processed count to cb_stackdriver_flush() (e.g., compute emitted_count
by starting from event_chunk->total_events and decrementing for each skipped
record inside the flb_log_event_decoder_next()/should_skip_record() loop or
accumulate emitted_count directly) and change metric reporting in
cb_stackdriver_flush() to use this emitted_count (or a new parameter name you
add) instead of event_chunk->total_events so dropped records are reflected in
flush and dropped-record metrics.

When a single record contains a logging.googleapis.com/labels field
that is not a map, the entire batch is dropped. This causes data loss
for all other valid records in the batch.

Extract a shared should_skip_record() helper that validates both
insertId and labels fields. Use it in the prescan loop (for array
sizing) and at the top of the main packing loop (before any field
extraction), so invalid records are skipped cleanly without complex
cleanup paths.

This change only addresses the labels-not-a-map case. Other batch-
level drop scenarios (k8s local_resource_id failures, decoder init
failures, JSON serialization failures) remain unchanged as they
affect shared batch-level state by design.

Signed-off-by: Yu Yi <yiyu@yiyu.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant