out_stackdriver: fix batch drop on invalid labels#11539
out_stackdriver: fix batch drop on invalid labels#11539erain wants to merge 1 commit intofluent:masterfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR centralizes record validation in the Stackdriver plugin by adding a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fcf50e3 to
7928a60
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/out_stackdriver/stackdriver.c (1)
1927-1929:⚠️ Potential issue | 🟡 MinorLog a summary reason before returning on all-invalid batches.
If prescan rejects every record, the function returns before the logged pass, so there is no message explaining that all entries were skipped for invalid
insertIdor labels. Emitting one summary warning here would make this failure mode diagnosable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_stackdriver/stackdriver.c` around lines 1927 - 1929, When prescan leaves no valid records (array_size == 0) add a summary warning log immediately before the early return to explain why the batch is dropped (e.g., "all entries skipped: invalid insertId or labels"); use the same logging facility already used in this file (the local logger used elsewhere in stackdriver.c) and reference the prescan result/array_size check so the message is emitted whenever array_size == 0 instead of returning silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1917-1923: The loop that decrements array_size when
should_skip_record() returns true adjusts the emitted count locally but
cb_stackdriver_flush() still reports event_chunk->total_events; update the code
to propagate the actual emitted/processed count to cb_stackdriver_flush() (e.g.,
compute emitted_count by starting from event_chunk->total_events and
decrementing for each skipped record inside the
flb_log_event_decoder_next()/should_skip_record() loop or accumulate
emitted_count directly) and change metric reporting in cb_stackdriver_flush() to
use this emitted_count (or a new parameter name you add) instead of
event_chunk->total_events so dropped records are reflected in flush and
dropped-record metrics.
---
Outside diff comments:
In `@plugins/out_stackdriver/stackdriver.c`:
- Around line 1927-1929: When prescan leaves no valid records (array_size == 0)
add a summary warning log immediately before the early return to explain why the
batch is dropped (e.g., "all entries skipped: invalid insertId or labels"); use
the same logging facility already used in this file (the local logger used
elsewhere in stackdriver.c) and reference the prescan result/array_size check so
the message is emitted whenever array_size == 0 instead of returning silently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d8e76ba-c3cc-4894-b540-33bba85548b2
⛔ Files ignored due to path filters (4)
tests/runtime/data/stackdriver/stackdriver_batch_all_labels_not_a_map.logis excluded by!**/*.logtests/runtime/data/stackdriver/stackdriver_batch_first_record_labels_not_a_map.logis excluded by!**/*.logtests/runtime/data/stackdriver/stackdriver_batch_labels_not_a_map.logis excluded by!**/*.logtests/runtime/data/stackdriver/stackdriver_batch_mixed_errors.logis excluded by!**/*.log
📒 Files selected for processing (3)
plugins/out_stackdriver/stackdriver.ctests/runtime/data/stackdriver/stackdriver_test_labels.htests/runtime/out_stackdriver.c
| while ((ret = flb_log_event_decoder_next( | ||
| &log_decoder, | ||
| &log_event)) == FLB_EVENT_DECODER_SUCCESS) { | ||
| /* Extract insertId */ | ||
| in_status = validate_insert_id(&insert_id_obj, log_event.body); | ||
|
|
||
| if (in_status == INSERTID_INVALID) { | ||
| flb_plg_error(ctx->ins, | ||
| "Incorrect insertId received. InsertId should be non-empty string."); | ||
| if (should_skip_record(ctx, log_event.body, FLB_FALSE) == FLB_TRUE) { | ||
| array_size -= 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Propagate skipped-record count to the flush metrics.
array_size is reduced here, but the flush path still reports event_chunk->total_events as processed. After this change, a batch with locally skipped records will be counted as fully successful on HTTP 200, and dropped-record metrics will miss the records discarded by should_skip_record(). Please surface the emitted count to cb_stackdriver_flush() and use that for metrics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/out_stackdriver/stackdriver.c` around lines 1917 - 1923, The loop
that decrements array_size when should_skip_record() returns true adjusts the
emitted count locally but cb_stackdriver_flush() still reports
event_chunk->total_events; update the code to propagate the actual
emitted/processed count to cb_stackdriver_flush() (e.g., compute emitted_count
by starting from event_chunk->total_events and decrementing for each skipped
record inside the flb_log_event_decoder_next()/should_skip_record() loop or
accumulate emitted_count directly) and change metric reporting in
cb_stackdriver_flush() to use this emitted_count (or a new parameter name you
add) instead of event_chunk->total_events so dropped records are reflected in
flush and dropped-record metrics.
When a single record contains a logging.googleapis.com/labels field that is not a map, the entire batch is dropped. This causes data loss for all other valid records in the batch. Extract a shared should_skip_record() helper that validates both insertId and labels fields. Use it in the prescan loop (for array sizing) and at the top of the main packing loop (before any field extraction), so invalid records are skipped cleanly without complex cleanup paths. This change only addresses the labels-not-a-map case. Other batch- level drop scenarios (k8s local_resource_id failures, decoder init failures, JSON serialization failures) remain unchanged as they affect shared batch-level state by design. Signed-off-by: Yu Yi <yiyu@yiyu.me>
7928a60 to
faa836c
Compare
Description
When a single record contains a
logging.googleapis.com/labelsfield that is not a map, theout_stackdriverplugin currently drops the entire batch. This causes data loss for all other valid records in that batch.Root Cause
In
stackdriver_format(), the labels type check had two problems:destroy_http_request(), causing a memory leak).Fix
Extracted a shared
should_skip_record()helper that validates bothinsertIdandlabelsin one place:This helper is now used:
log_errors=FLB_FALSE) to correctly computearray_size(entries count).log_errors=FLB_TRUE), before any field extraction, so invalid records are skipped cleanly with no cleanup needed.Scope
The following batch-level drop scenarios remain unchanged as they affect shared batch-level state by design:
flb_log_event_decoder_init()failureflb_log_event_decoder_next()failurelocal_resource_idextraction/processing failures (k8s_container,k8s_node,k8s_pod)process_local_resource_id()failuresTesting
New Tests (7 total)
labels_not_a_maplabels_not_a_map_custom_keylabels_keywith string value → no outputlabels_not_a_map_with_extracted_fieldsbatch_labels_not_a_mapbatch_first_record_labels_not_a_mapbatch_all_records_labels_not_a_mapbatch_mixed_errorsBatch tests with content assertions (
batch_labels_not_a_map,batch_first_record_labels_not_a_map,batch_mixed_errors) use tail/file-backed fixtures to guarantee single-batch semantics. Thebatch_all_records_labels_not_a_maptest uses a fixture file as well but only asserts no output.Regression Tests (all pass)
insertId_common_case,empty_insertId,insertId_incorrect_type_int,default_labels,custom_labels,config_labels_conflict,config_labels_no_conflict,default_labels_k8s_resource_type,custom_labels_k8s_resource_type,severity_multi_entriesSummary by CodeRabbit
Bug Fixes
Tests