in_amqp: Initial implementation for in_amqp plugin#11427
in_amqp: Initial implementation for in_amqp plugin#11427matwey wants to merge 1 commit intofluent:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds a new AMQP input plugin: build gating for rabbitmq-c, plugin registration and CMake entries, public header types for connection/plugin state, and a full plugin implementation handling AMQP connections, consumption, parsing, encoding, retry/reconnect, and lifecycle callbacks. Changes
Sequence DiagramssequenceDiagram
participant FLB as Fluent Bit
participant Plugin as in_amqp Plugin
participant Broker as AMQP Broker
participant Parser as Parser
participant Encoder as Encoder
FLB->>Plugin: in_amqp_init()
Plugin->>Plugin: load config (uri, queue, parser, reconnect)
Plugin->>Broker: open socket / login / open channel / basic_consume
Plugin->>Parser: init parser (if configured)
Plugin->>Encoder: init encoder
Plugin->>FLB: register collect callback
sequenceDiagram
participant Collector as in_amqp collect loop
participant Broker as AMQP Broker
participant Parser as Parser
participant Encoder as Encoder
participant FLB as Fluent Bit
Collector->>Broker: amqp_consume_message() / poll
alt delivery received
Broker->>Collector: envelope (body, headers, props, routing_key)
Collector->>Parser: parse body (optional)
Parser-->>Collector: parsed record / error
Collector->>Collector: attach metadata (routing_key, headers, timestamp)
Collector->>Encoder: encode event
Encoder-->>Collector: msgpack buffer
Collector->>FLB: emit record
else error / connection loss
Broker-->>Collector: error / timeout
Collector->>Collector: schedule reconnect via retry collector
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ead056ea6c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Around line 579-583: The cast in in_amqp_exit improperly dereferences the
unused parameter using "(void) *config;", which can cause undefined behavior if
config is NULL; change that to the standard unused-parameter idiom "(void)
config;" inside the in_amqp_exit function so you do not dereference config while
keeping the variable marked as intentionally unused.
- Around line 446-516: The ctx struct allocated in in_amqp_init is created with
flb_malloc so its members (notably ctx->encoder) are uninitialized and later
in_amqp_config_destroy calls flb_log_event_encoder_destroy which can free
invalid pointers; change the allocation in in_amqp_init to use flb_calloc (or
allocate with flb_malloc followed by memset to zero) so struct flb_amqp fields
(including ctx->encoder) are zero-initialized before any early teardown; keep
all other initialization logic the same.
- Around line 170-176: The flb_plg_trace call logs AMQP body using "%s" which
assumes NUL-termination and can read past the buffer; change the trace to use a
length-limited format (use "%.*s" with body->len) so it prints at most body->len
bytes from body->bytes. Update the flb_plg_trace invocation(s) in the parsing
block around flb_parser_do (referencing ctx->parser, flb_parser_do, body->bytes,
body->len, and ctx->ins) to pass (int)body->len and body->bytes accordingly and
keep the flb_plg_error call as-is.
- Around line 182-218: The code reads optional fields from
amqp_basic_properties_t (properties->timestamp, content_type, content_encoding,
correlation_id, reply_to, headers) without checking the properties->_flags
bitmask; update the logic around flb_time_set(&out_time, properties->timestamp,
0) and each call to in_amqp_append_metadata_bytes/in_amqp_append_metadata_entry
so you first test the appropriate flag (AMQP_BASIC_TIMESTAMP_FLAG,
AMQP_BASIC_CONTENT_TYPE_FLAG, AMQP_BASIC_CONTENT_ENCODING_FLAG,
AMQP_BASIC_CORRELATION_ID_FLAG, AMQP_BASIC_REPLY_TO_FLAG,
AMQP_BASIC_HEADERS_FLAG) before accessing the field; only call flb_time_set or
in_amqp_append_metadata_* when the corresponding flag is set, otherwise skip to
the fallback (flb_time_get) or omit that metadata entry to avoid reading
uninitialized memory (refer to symbols: properties, properties->_flags,
AMQP_BASIC_*_FLAG, out_time, flb_time_set, flb_time_get,
in_amqp_append_metadata_bytes, in_amqp_append_metadata_entry, headers,
ctx->encoder).
- Around line 350-386: The AMQP_RESPONSE_LIBRARY_EXCEPTION branch currently
handles only UNEXPECTED_STATE, CONNECTION_CLOSED and TIMEOUT; other
ret.library_error values from amqp_consume_message() should be handled to avoid
tight CPU loops—add an else clause after the existing checks that logs the
unexpected ret.library_error via flb_plg_warn (including the numeric error),
tears down the connection with in_amqp_connection_destroy(ctx) and either return
a non-zero error or call in_amqp_consumer_start(in, config, in_context) to
trigger a reconnect/backoff; reference the amqp_consume_message() return
handling, ret.library_error, AMQP_RESPONSE_LIBRARY_EXCEPTION, flb_plg_warn,
in_amqp_connection_destroy and in_amqp_consumer_start when applying the change.
- Around line 398-423: When reconnecting, the old socket collector
(ctx->coll_id) is left registered and must be removed before creating a new one;
modify the reconnect path (in the code that calls in_amqp_connection_init and
then sets ctx->coll_id) to check if ctx->coll_id >= 0 and call
flb_input_collector_destroy(ctx->coll_id, in) (or the appropriate
collector-remove API used elsewhere) to delete the stale collector, then clear
ctx->coll_id and proceed to create the new collector with
flb_input_set_collector_socket (refer to ctx->coll_id, in_amqp_collect,
amqp_socket_get_sockfd and in_amqp_connection_init).
3a5b411 to
d3d939e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Around line 421-433: When collector creation or start fails, the code must
clean up the partially-created collector and the open AMQP connection to avoid
leaving the plugin stuck: if flb_input_set_collector_socket succeeded but
flb_input_collector_start failed, call flb_input_collector_remove(ctx->coll_id)
(or the equivalent collector-remove API) and set ctx->coll_id = -1; in both
failure branches close the underlying socket (c->sock) and tear down the AMQP
connection using the library cleanup functions (e.g., amqp_socket_close(c->sock)
and the appropriate amqp connection destroy/close function), clear any
connection pointers in ctx, and then return the error.
- Around line 346-386: in in_amqp_collect(), when amqp_consume_message() returns
AMQP_RESPONSE_SERVER_EXCEPTION or AMQP_RESPONSE_NONE, call
in_amqp_log_reply_error(&ret, in) to record the error, then call
in_amqp_connection_destroy(ctx) and restart the consumer by returning
in_amqp_consumer_start(in, config, in_context); this mirrors the existing
library-exception handling flow and prevents a busy spin; reference the
amqp_consume_message return handling, in_amqp_log_reply_error(),
in_amqp_connection_destroy(), and in_amqp_consumer_start().
- Around line 592-595: The default AMQP URI "amqp://" is missing a HOST and will
cause amqp_parse_url to fail; update the default for the config map entry that
sets struct flb_amqp->uri (the FLB_CONFIG_MAP_STR for "uri") to a parseable
value (either NULL to allow amqp_default_connection_info or a concrete
"amqp://localhost") and ensure any code that reads flb_amqp->uri and calls
amqp_parse_url handles NULL appropriately by falling back to
amqp_default_connection_info; locate the config map entry (FLB_CONFIG_MAP_STR,
"uri", "amqp://") and change the default to NULL or "amqp://localhost" and
adjust initialization paths that call amqp_parse_url to accommodate the new
default.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Around line 139-141: Fix the typo in the comment inside the switch handling
AMQP field kinds: change the comment "fallthrought" to the correct "fallthrough"
where the case AMQP_FIELD_KIND_UTF8 falls into case AMQP_FIELD_KIND_BYTES so the
intent is clear; update the comment text near the AMQP_FIELD_KIND_UTF8 /
AMQP_FIELD_KIND_BYTES case labels to the correct spelling.
- Around line 413-451: in in_amqp_consumer_start: remove the direct call to
in_amqp_collect (the mutual recursion risk) and instead finish setup after
resetting ctx->retry and pausing the retry collector; rely on the socket/IO
collector you registered earlier to invoke in_amqp_collect naturally.
Specifically, stop calling in_amqp_collect(in, config, ctx) here, keep the
flb_input_collector_pause(ctx->retry_coll_id, in) and return 0 so the existing
socket collector will handle pending messages; ensure no other code path assumes
in_amqp_consumer_start will synchronously process the queue.
- Around line 276-327: The loop in in_amqp_collect can recurse unbounded by
calling in_amqp_consumer_start on AMQP_STATUS_CONNECTION_CLOSED; change the
behavior so that when ret.library_error == AMQP_STATUS_CONNECTION_CLOSED you
call in_amqp_connection_destroy(ctx) and break out of the for(;;) loop (or
return an error code) instead of calling in_amqp_consumer_start to avoid
recursion, letting the outer retry/timer logic handle reconnects; also add
catch-all branches: handle other ret.reply_type values
(AMQP_RESPONSE_SERVER_EXCEPTION, AMQP_RESPONSE_NONE) by logging and
breaking/returning appropriately, and add an else for ret.library_error to log
the unexpected error and break/return so the loop does not spin indefinitely.
- Around line 566-572: in in_amqp_pause ensure you only call
flb_input_collector_pause for the connection collector when a connection exists:
check ctx->conn.coll_id != -1 before invoking
flb_input_collector_pause(ctx->conn.coll_id, ctx->ins); still pause
ctx->retry_coll_id unconditionally as before. This mirrors the guard used in
in_amqp_resume and prevents attempting to pause a non-existent collector.
- Around line 585-593: Add a NULL guard in in_amqp_exit: check whether the
incoming data pointer is NULL before casting to struct flb_amqp *ctx and calling
in_amqp_config_destroy; if data is NULL simply return 0. This ensures
in_amqp_config_destroy (which dereferences ctx->encoder, ctx->retry_coll_id,
ctx->ins) is only called when ctx is non-NULL and avoids null pointer
dereference when the framework invokes cb_exit with a NULL context.
| for (;;) { | ||
| amqp_rpc_reply_t ret; | ||
| amqp_envelope_t envelope; | ||
|
|
||
| amqp_maybe_release_buffers(c->conn); | ||
| ret = amqp_consume_message(c->conn, &envelope, &tv, 0); | ||
|
|
||
| if (ret.reply_type == AMQP_RESPONSE_NORMAL) { | ||
| in_amqp_handle_envelope(ctx, &envelope); | ||
|
|
||
| amqp_destroy_envelope(&envelope); | ||
| } else if (ret.reply_type == AMQP_RESPONSE_LIBRARY_EXCEPTION) { | ||
| /* | ||
| * If ret.reply_type == AMQP_RESPONSE_LIBRARY_EXCEPTION, and | ||
| * ret.library_error == AMQP_STATUS_UNEXPECTED_STATE, a frame | ||
| * other than AMQP_BASIC_DELIVER_METHOD was received, the caller | ||
| * should call amqp_simple_wait_frame() to read this frame and | ||
| * take appropriate action. | ||
| */ | ||
| if (ret.library_error == AMQP_STATUS_UNEXPECTED_STATE) { | ||
| if (amqp_simple_wait_frame(c->conn, &frame) != AMQP_STATUS_OK) { | ||
| break; | ||
| } | ||
|
|
||
| if (frame.frame_type == AMQP_FRAME_METHOD) { | ||
| switch (frame.payload.method.id) { | ||
| case AMQP_CHANNEL_CLOSE_METHOD: { | ||
| amqp_channel_close_t *m = (amqp_channel_close_t *)frame.payload.method.decoded; | ||
| flb_plg_warn(in, "AMQP server channel error %hu, message: %.*s", m->reply_code, (int)m->reply_text.len, (char *)m->reply_text.bytes); | ||
| break; | ||
| } | ||
| case AMQP_CONNECTION_CLOSE_METHOD: { | ||
| amqp_connection_close_t *m = (amqp_connection_close_t *)frame.payload.method.decoded; | ||
| flb_plg_warn(in, "AMQP server connection error %hu, message: %.*s", m->reply_code, (int)m->reply_text.len, (char *)m->reply_text.bytes); | ||
| break; | ||
| } | ||
| default: | ||
| flb_plg_warn(in, "An unexpected AMQP method id 0x%08X", frame.payload.method.id); | ||
| } | ||
| } | ||
| } else if (ret.library_error == AMQP_STATUS_CONNECTION_CLOSED) { | ||
| in_amqp_connection_destroy(ctx); | ||
|
|
||
| return in_amqp_consumer_start(in, config, in_context); | ||
| } else if (ret.library_error == AMQP_STATUS_TIMEOUT) { | ||
| return 0; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Unbounded recursion between in_amqp_collect and in_amqp_consumer_start.
When AMQP_STATUS_CONNECTION_CLOSED is detected at line 317, the code calls in_amqp_consumer_start, which on success calls in_amqp_collect again (line 443). If the new connection also drops immediately, this chain recurses indefinitely, risking a stack overflow.
Additionally, unhandled reply_type values (AMQP_RESPONSE_SERVER_EXCEPTION, AMQP_RESPONSE_NONE) and unhandled library_error values (anything other than UNEXPECTED_STATE, CONNECTION_CLOSED, TIMEOUT) cause the for(;;) loop to spin without breaking or reconnecting.
Consider:
- Breaking out of
in_amqp_collecton connection failure instead of recursing — let the retry timer handle reconnection. - Adding a catch-all
elsefor unhandledlibrary_errorvalues and anelsefor unhandledreply_typevalues.
Proposed fix (break instead of recurse; add catch-all handlers)
} else if (ret.library_error == AMQP_STATUS_CONNECTION_CLOSED) {
in_amqp_connection_destroy(ctx);
-
- return in_amqp_consumer_start(in, config, in_context);
+ /* Let the retry timer handle reconnection */
+ if (!flb_input_collector_running(ctx->retry_coll_id, in)) {
+ flb_input_collector_resume(ctx->retry_coll_id, in);
+ }
+ return 0;
} else if (ret.library_error == AMQP_STATUS_TIMEOUT) {
return 0;
+ } else {
+ flb_plg_error(in, "AMQP consume error: %s", amqp_error_string2(ret.library_error));
+ in_amqp_connection_destroy(ctx);
+ if (!flb_input_collector_running(ctx->retry_coll_id, in)) {
+ flb_input_collector_resume(ctx->retry_coll_id, in);
+ }
+ return 0;
}
+ } else {
+ in_amqp_log_reply_error(in, ret, "AMQP consume error");
+ in_amqp_connection_destroy(ctx);
+ if (!flb_input_collector_running(ctx->retry_coll_id, in)) {
+ flb_input_collector_resume(ctx->retry_coll_id, in);
+ }
+ return 0;
}🤖 Prompt for AI Agents
In `@plugins/in_amqp/in_amqp.c` around lines 276 - 327, The loop in
in_amqp_collect can recurse unbounded by calling in_amqp_consumer_start on
AMQP_STATUS_CONNECTION_CLOSED; change the behavior so that when
ret.library_error == AMQP_STATUS_CONNECTION_CLOSED you call
in_amqp_connection_destroy(ctx) and break out of the for(;;) loop (or return an
error code) instead of calling in_amqp_consumer_start to avoid recursion,
letting the outer retry/timer logic handle reconnects; also add catch-all
branches: handle other ret.reply_type values (AMQP_RESPONSE_SERVER_EXCEPTION,
AMQP_RESPONSE_NONE) by logging and breaking/returning appropriately, and add an
else for ret.library_error to log the unexpected error and break/return so the
loop does not spin indefinitely.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
1 similar comment
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Around line 451-459: in in_amqp_config_destroy, guard the call to
flb_input_collector_delete by checking ctx->retry_coll_id is valid (not -1)
before calling it; update the function handling for struct flb_amqp (used by
in_amqp_init and cleanup paths) to only call
flb_input_collector_delete(ctx->retry_coll_id, ctx->ins) when ctx->retry_coll_id
has been initialized to a non -1 value, leaving other cleanup
(flb_log_event_encoder_destroy, in_amqp_connection_destroy, flb_free) unchanged.
- Around line 327-339: in_amqp_connection_destroy currently calls
flb_input_collector_delete unconditionally which can be invoked with c->coll_id
== -1; update the function to check c->coll_id != -1 before calling
flb_input_collector_delete and only reset c->coll_id to -1 after successful
deletion; locate the logic in in_amqp_connection_destroy that references struct
flb_amqp_connection *c = &ctx->conn, the c->conn check, and the
flb_input_collector_delete(c->coll_id, ctx->ins) call to add the guard.
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Line 301: Fix the three English typos in the in_amqp.c messages: change the
comment/string "All messages has been processed" to "All messages have been
processed", change "An error occured during waiting frame" to "An error occurred
during waiting frame", and change "An error occured during consuming message" to
"An error occurred during consuming message" (update the literal
strings/comments where they appear in in_amqp.c so both logs and comments
reflect the corrected spelling).
- Around line 288-342: The loop in in_amqp_collect can spin unbounded if
in_amqp_consumer_start succeeds but the broker immediately drops the new
connection; to fix, after detecting a successful in_amqp_consumer_start (the
call to in_amqp_consumer_start(ctx, config) following
in_amqp_connection_destroy), break out of the for(;;) so the socket collector
can re-schedule collection instead of retrying inline, or alternatively
implement a local reconnect limit/backoff counter in in_amqp_collect that
increments on consecutive immediate failures and returns an error or sleeps when
the limit is reached; update references to c->conn and ctx handling so you don't
continue consuming on a dropped connection.
🧹 Nitpick comments (2)
plugins/in_amqp/in_amqp.c (2)
213-227: Missingrollback_mapon metadata header encoding failure.If an entry fails during the loop (lines 220-222) or
begin_mapsucceeds butcommit_mapis never reached, the map is left uncommitted. Whileflb_log_event_encoder_resetat line 263 will clean up the whole record, adding a rollback on error is the correct pattern used elsewhere (e.g.,flb_opentelemetry_logs.c,kubernetes.c) and prevents potential encoder state inconsistencies.Proposed fix
if (ret == FLB_EVENT_ENCODER_SUCCESS && (properties->_flags & AMQP_BASIC_HEADERS_FLAG)) { ret = flb_log_event_encoder_append_metadata_cstring(&ctx->encoder, "headers"); if (ret == FLB_EVENT_ENCODER_SUCCESS) { ret = flb_log_event_encoder_metadata_begin_map((&ctx->encoder)); } for (i = 0; i < headers->num_entries && ret == FLB_EVENT_ENCODER_SUCCESS; ++i) { ret = in_amqp_append_metadata_entry(&ctx->encoder, &headers->entries[i]); } if (ret == FLB_EVENT_ENCODER_SUCCESS) { ret = flb_log_event_encoder_metadata_commit_map((&ctx->encoder)); + } else { + flb_log_event_encoder_metadata_rollback_map((&ctx->encoder)); } }
608-617: Pause/resume only handles one collector — verify intent under edge cases.
in_amqp_pausepauses either the socket collector (if connected) or the retry collector (if not). If the framework callspauseduring a transient state where the connection just dropped but the retry collector hasn't been resumed yet, neither collector gets paused. This window is narrow but exists betweenin_amqp_connection_destroy(which deletes the socket collector) and the subsequentflb_input_collector_resume(ctx->retry_coll_id, ...).Consider pausing both collectors unconditionally (guarded by validity checks) for robustness:
Proposed fix
static void in_amqp_pause(void *data, struct flb_config *config) { struct flb_amqp *ctx = data; - if (ctx->conn.conn) { - flb_input_collector_pause(ctx->conn.coll_id, ctx->ins); - } else { + if (ctx->conn.coll_id >= 0) { + flb_input_collector_pause(ctx->conn.coll_id, ctx->ins); + } + if (ctx->retry_coll_id >= 0) { flb_input_collector_pause(ctx->retry_coll_id, ctx->ins); } }
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2377614c1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@coderabbitai resume |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@plugins/in_amqp/in_amqp.c`:
- Around line 167-176: The parser error path can leak memory because
flb_parser_do may allocate out_buf before returning an error; in the block that
checks ctx->parser and calls flb_parser_do (the one that currently logs then
returns ret), ensure any partially allocated out_buf is freed before returning:
initialize out_buf to NULL before the call, and on ret < 0 call
flb_free(out_buf) (or equivalent) before returning the error. Update the error
path in the function that contains ctx->parser/flb_parser_do so out_buf is
cleaned up and out_size/out_time handling remains consistent.
|
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83ae35cb81
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| in_amqp_log_reply_error(in, reply, "An error occurred during consuming message"); | ||
|
|
||
| in_amqp_connection_destroy(ctx); |
There was a problem hiding this comment.
Skip reconnect for non-close AMQP unexpected-state frames
After handling AMQP_STATUS_UNEXPECTED_STATE, this path always logs an error and destroys/recreates the connection. That means non-delivery frames that are not connection/channel-close (e.g., normal control/heartbeat traffic) still trigger a full reconnect cycle, which can cause repeated disconnect/reconnect churn and dropped throughput on otherwise healthy brokers. The branch should continue consuming for benign frames and only reconnect on definitive close/error conditions.
Useful? React with 👍 / 👎.
|
|
||
| queue_bytes.len = flb_sds_len(ctx->queue_name); | ||
| queue_bytes.bytes = ctx->queue_name; | ||
| if (amqp_basic_consume(c->conn, c->chan, queue_bytes, amqp_empty_bytes, 0, 1, 1, amqp_empty_table) == NULL) { |
There was a problem hiding this comment.
Disable forced exclusive mode in amqp_basic_consume
The consume call hardcodes exclusive=1, which asks RabbitMQ for an exclusive consumer every time. In queues that already have another consumer (including another Fluent Bit instance), the broker will reject the consume request and this plugin will loop through reconnect attempts instead of consuming. This should be non-exclusive by default (or configurable) to avoid breaking common shared-queue deployments.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This makes no sense. There can not be two consumers for this queue, because it will lead to data loss.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
plugins/in_amqp/in_amqp.c (3)
538-543: Consider usingflb_callocfor consistency with other input plugins.The
flb_malloc+memsetapproach is functionally correct, butflb_calloc(1, sizeof(struct flb_amqp))is the prevailing idiom in Fluent Bit input plugins and achieves the same zero-initialization in one call. The explicit-1assignments forretry_coll_idandconn.coll_idon lines 547 and 552 are still needed regardless.
608-628: Pause/resume should handle both collectors to avoid missed state transitions.Currently,
in_amqp_pauseonly pauses one collector based on whether a connection exists. If the plugin is paused while connected, only the socket collector is paused. If the connection drops while paused (e.g., TCP keepalive detects a dead connection — the socket collector is paused soin_amqp_collectwon't run), and thenin_amqp_resumeis called, it would checkctx->conn.conn(still non-NULL since nothing cleaned it up), resume only the socket collector (now pointing at a dead FD), and the retry collector would never be resumed.This is an edge case, but unconditionally pausing/resuming both collectors (guarding against invalid
coll_id) would be more robust:Proposed fix
static void in_amqp_pause(void *data, struct flb_config *config) { struct flb_amqp *ctx = data; - if (ctx->conn.conn) { + if (ctx->conn.coll_id >= 0) { flb_input_collector_pause(ctx->conn.coll_id, ctx->ins); - } else { - flb_input_collector_pause(ctx->retry_coll_id, ctx->ins); } + flb_input_collector_pause(ctx->retry_coll_id, ctx->ins); } static void in_amqp_resume(void *data, struct flb_config *config) { struct flb_amqp *ctx = data; - if (ctx->conn.conn) { + if (ctx->conn.coll_id >= 0) { flb_input_collector_resume(ctx->conn.coll_id, ctx->ins); - } else { + } + if (!ctx->conn.conn) { flb_input_collector_resume(ctx->retry_coll_id, ctx->ins); } }
509-511:amqp_parse_urlmutates the URI string in-place, creating a fragile pattern.
amqp_parse_urlinserts NUL bytes into its input buffer and stores pointers into it withinconn_info. Sincectx->uriis a framework-managedflb_sds_tfrom the config map, this mutates the config string in-place. While this works correctly at runtime (the SDS length header preserves proper deallocation), it meansctx->uriis no longer a valid C-string after parsing. Although no logging ofctx->uricurrently occurs after the parse, this creates a fragile pattern where future debug/config logging could show truncated values. Consider making a mutable copy to avoid this brittleness:Proposed refactor
if (ctx->uri) { - ret = amqp_parse_url(ctx->uri, &ctx->conn_info); + ctx->uri_buf = flb_sds_create(ctx->uri); + if (!ctx->uri_buf) { + return -1; + } + ret = amqp_parse_url(ctx->uri_buf, &ctx->conn_info); if (ret != AMQP_STATUS_OK) {(with corresponding
flb_sds_destroy(ctx->uri_buf)inin_amqp_config_destroy)
There was a problem hiding this comment.
🧹 Nitpick comments (2)
plugins/in_amqp/in_amqp.c (2)
154-268: Solid envelope handling with proper_flagsguards and encoder error chaining.A few minor observations:
- Line 213: extra whitespace in
ret == FLB_EVENT_ENCODER_SUCCESS(double space beforeFLB_EVENT_ENCODER_SUCCESS).- If a header entry encoding fails mid-iteration (line 220-222), the map opened at line 217 is never explicitly rolled back via
flb_log_event_encoder_metadata_rollback_map. The overall record won't be committed (guarded at line 250), andencoder_resetat line 263 will clean up, so this doesn't cause a data corruption bug. However, an explicit rollback on error would be more robust.Neither is blocking — the encoder reset path handles cleanup correctly.
Minor: fix double space and add explicit map rollback
- if (ret == FLB_EVENT_ENCODER_SUCCESS && (properties->_flags & AMQP_BASIC_HEADERS_FLAG)) { + if (ret == FLB_EVENT_ENCODER_SUCCESS && (properties->_flags & AMQP_BASIC_HEADERS_FLAG)) {For robustness, consider adding an
elsebranch after the commit_map:if (ret == FLB_EVENT_ENCODER_SUCCESS) { ret = flb_log_event_encoder_metadata_commit_map((&ctx->encoder)); + } else { + flb_log_event_encoder_metadata_rollback_map((&ctx->encoder)); }
657-665: Consider validatingreconnect.retry_intervalto prevent a rapid-fire timer.A value of
0forreconnect.retry_intervalwould create a timer with zero-second interval, potentially causing high CPU usage during reconnect cycles. A minimum (e.g., 1 second) would be a reasonable safeguard.Suggested validation in in_amqp_configure
+ if (ctx->reconnect_retry_interval < 1) { + flb_plg_warn(in, "reconnect.retry_interval must be >= 1, defaulting to 1"); + ctx->reconnect_retry_interval = 1; + } + if (!ctx->queue_name) {
Signed-off-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
|
@coderabbitai resume |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_amqp/in_amqp.c`:
- Around line 586-603: The code sets the input context with
flb_input_set_context(in, ctx) but on the two late-init failure paths (after
flb_input_set_collector_time returning <0 and after in_amqp_consumer_start
returning <0) it calls in_amqp_config_destroy(ctx) without clearing in->context,
leaving a dangling pointer; update both failure branches to call
flb_input_set_context(in, NULL) (or otherwise clear the input context) before
calling in_amqp_config_destroy(ctx) and returning -1 so the freed ctx is not
referenced by the input instance.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txtcmake/plugins_options.cmakeplugins/CMakeLists.txtplugins/in_amqp/CMakeLists.txtplugins/in_amqp/in_amqp.cplugins/in_amqp/in_amqp.h
🚧 Files skipped from review as they are similar to previous changes (4)
- plugins/CMakeLists.txt
- plugins/in_amqp/in_amqp.h
- CMakeLists.txt
- plugins/in_amqp/CMakeLists.txt
| flb_input_set_context(in, ctx); | ||
|
|
||
| ctx->retry_coll_id = flb_input_set_collector_time(in, &in_amqp_reconnect, ctx->reconnect_retry_interval, 0, config); | ||
| if (ctx->retry_coll_id < 0) { | ||
| flb_plg_error(in, "Cannot create reconnection collector"); | ||
| in_amqp_config_destroy(ctx); | ||
|
|
||
| return -1; | ||
| } | ||
|
|
||
| flb_input_collector_pause(ctx->retry_coll_id, in); | ||
|
|
||
| ret = in_amqp_consumer_start(ctx, config); | ||
| if (ret < 0) { | ||
| flb_plg_error(in, "Cannot start AMQP consumer"); | ||
| in_amqp_config_destroy(ctx); | ||
|
|
||
| return -1; |
There was a problem hiding this comment.
Clear the input context before freeing ctx on late init failures.
After flb_input_set_context(in, ctx) at Line 586, failure paths at Lines 589-593 and 599-603 free ctx but leave in->context pointing to freed memory. Clear the context before destroy in those branches.
🔧 Suggested fix
flb_input_set_context(in, ctx);
ctx->retry_coll_id = flb_input_set_collector_time(in, &in_amqp_reconnect, ctx->reconnect_retry_interval, 0, config);
if (ctx->retry_coll_id < 0) {
flb_plg_error(in, "Cannot create reconnection collector");
+ flb_input_set_context(in, NULL);
in_amqp_config_destroy(ctx);
return -1;
}
@@
ret = in_amqp_consumer_start(ctx, config);
if (ret < 0) {
flb_plg_error(in, "Cannot start AMQP consumer");
+ flb_input_set_context(in, NULL);
in_amqp_config_destroy(ctx);
return -1;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/in_amqp/in_amqp.c` around lines 586 - 603, The code sets the input
context with flb_input_set_context(in, ctx) but on the two late-init failure
paths (after flb_input_set_collector_time returning <0 and after
in_amqp_consumer_start returning <0) it calls in_amqp_config_destroy(ctx)
without clearing in->context, leaving a dangling pointer; update both failure
branches to call flb_input_set_context(in, NULL) (or otherwise clear the input
context) before calling in_amqp_config_destroy(ctx) and returning -1 so the
freed ctx is not referenced by the input instance.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d485c33e5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } else if (frame.frame_type == AMQP_FRAME_METHOD) { | ||
| switch (frame.payload.method.id) { | ||
| case AMQP_CHANNEL_CLOSE_METHOD: { |
There was a problem hiding this comment.
Ignore heartbeat frames on unexpected-state consume replies
When amqp_consume_message() returns AMQP_STATUS_UNEXPECTED_STATE, this code only treats method frames as recoverable and falls through to connection teardown for any non-method frame. In RabbitMQ deployments with negotiated heartbeats (commonly non-zero), heartbeat frames can hit this path and trigger needless disconnect/reconnect cycles, causing intermittent ingestion gaps and churn instead of continuing consumption.
Useful? React with 👍 / 👎.
|
|
||
| queue_bytes.len = flb_sds_len(ctx->queue_name); | ||
| queue_bytes.bytes = ctx->queue_name; | ||
| if (amqp_basic_consume(c->conn, c->chan, queue_bytes, amqp_empty_bytes, 0, 1, 1, amqp_empty_table) == NULL) { |
There was a problem hiding this comment.
Avoid forcing exclusive consumers in basic.consume
The amqp_basic_consume call hard-codes exclusive=1, which makes the broker reject this consumer whenever the queue already has another active consumer. That turns normal shared-queue or failover topologies into repeated startup/reconnect failures; this should default to non-exclusive (or be configurable) so the plugin can attach to queues used by multiple consumers.
Useful? React with 👍 / 👎.
|
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/in_amqp/in_amqp.c (1)
471-479:⚠️ Potential issue | 🟠 MajorClear input context during teardown to avoid dangling pointer after late init failures.
After
flb_input_set_context(in, ctx)on Line 586, failures on Line 589 and Line 599 callin_amqp_config_destroy(ctx).ctxis freed, butin->contextis not cleared, leaving stale state and possible double cleanup paths.🔧 Proposed fix
static int in_amqp_config_destroy(struct flb_amqp *ctx) { + if (ctx->ins) { + flb_input_set_context(ctx->ins, NULL); + } + flb_log_event_encoder_destroy(&ctx->encoder); if (ctx->retry_coll_id >= 0) { flb_input_collector_delete(ctx->retry_coll_id, ctx->ins); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_amqp/in_amqp.c` around lines 471 - 479, in_amqp_config_destroy currently frees ctx but does not clear the input instance's context pointer, leaving in->context dangling after failures that call this teardown; update the teardown so that after flb_input_set_context(in, ctx) failures and inside in_amqp_config_destroy you explicitly clear the input context (set in->context = NULL) before or immediately after freeing ctx, or change in_amqp_config_destroy signature to accept struct flb_input_instance *in and null out in->context inside the function (affecting functions: in_amqp_config_destroy, the code paths that call flb_input_set_context and call in_amqp_config_destroy).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_amqp/in_amqp.c`:
- Around line 650-675: The new AMQP input options declared in config_map
(fields: uri, queue, parser, reconnect.retry_limits, reconnect.retry_interval
tied to struct flb_amqp) lack user-facing documentation; add a concise docs
section describing each option's purpose and type, state the defaults ("amqp://"
for uri, no default for queue and parser, reconnect.retry_limits=5,
reconnect.retry_interval=60), show a minimal working example config snippet
using these keys, and update the plugin's README/docs index and changelog so the
PR no longer requires a docs follow-up.
---
Duplicate comments:
In `@plugins/in_amqp/in_amqp.c`:
- Around line 471-479: in_amqp_config_destroy currently frees ctx but does not
clear the input instance's context pointer, leaving in->context dangling after
failures that call this teardown; update the teardown so that after
flb_input_set_context(in, ctx) failures and inside in_amqp_config_destroy you
explicitly clear the input context (set in->context = NULL) before or
immediately after freeing ctx, or change in_amqp_config_destroy signature to
accept struct flb_input_instance *in and null out in->context inside the
function (affecting functions: in_amqp_config_destroy, the code paths that call
flb_input_set_context and call in_amqp_config_destroy).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txtcmake/plugins_options.cmakeplugins/CMakeLists.txtplugins/in_amqp/CMakeLists.txtplugins/in_amqp/in_amqp.cplugins/in_amqp/in_amqp.h
🚧 Files skipped from review as they are similar to previous changes (2)
- CMakeLists.txt
- plugins/in_amqp/CMakeLists.txt
| static struct flb_config_map config_map[] = { | ||
| { | ||
| FLB_CONFIG_MAP_STR, "uri", "amqp://", | ||
| 0, FLB_TRUE, offsetof(struct flb_amqp, uri), | ||
| "Specify an AMQP URI to connect the broker" | ||
| }, | ||
| { | ||
| FLB_CONFIG_MAP_STR, "queue", NULL, | ||
| 0, FLB_TRUE, offsetof(struct flb_amqp, queue_name), | ||
| "Specify an AMQP queue name to consume from" | ||
| }, | ||
| { | ||
| FLB_CONFIG_MAP_STR, "parser", NULL, | ||
| 0, FLB_TRUE, offsetof(struct flb_amqp, parser_name), | ||
| "Set a parser" | ||
| }, | ||
| { | ||
| FLB_CONFIG_MAP_INT, "reconnect.retry_limits", "5", | ||
| 0, FLB_TRUE, offsetof(struct flb_amqp, reconnect_retry_limits), | ||
| "Maximum number to retry to connect the broker" | ||
| }, | ||
| { | ||
| FLB_CONFIG_MAP_INT, "reconnect.retry_interval", "60", | ||
| 0, FLB_TRUE, offsetof(struct flb_amqp, reconnect_retry_interval), | ||
| "Retry interval to connect the broker" | ||
| }, |
There was a problem hiding this comment.
Add user-facing documentation for the new AMQP input options before merge.
Lines 652-675 introduce externally configurable AMQP settings (uri, queue, parser, reconnect.*). The PR is marked docs-required and this surface should be documented with option semantics/defaults and a minimal example.
I can draft a concise docs section for these options if you want me to open a follow-up issue template.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/in_amqp/in_amqp.c` around lines 650 - 675, The new AMQP input options
declared in config_map (fields: uri, queue, parser, reconnect.retry_limits,
reconnect.retry_interval tied to struct flb_amqp) lack user-facing
documentation; add a concise docs section describing each option's purpose and
type, state the defaults ("amqp://" for uri, no default for queue and parser,
reconnect.retry_limits=5, reconnect.retry_interval=60), show a minimal working
example config snippet using these keys, and update the plugin's README/docs
index and changelog so the PR no longer requires a docs follow-up.
|
|
||
| # Inputs (sources, data collectors) | ||
| # ================================= | ||
| DEFINE_OPTION(FLB_IN_AMQP "Enable AMQP input plugin" ON) |
There was a problem hiding this comment.
Does this compile for all supported targets, including Windows and macOS? If not you need to provide the cmake overrides for those targets appropriately - there are macOS and Windows ones plus you can do the usual CMake config approach to enable/disable as required.
We provide a linked build script in the PR template to help test Linux targets.
There are no changes to any of the builds so no new dependencies are required?
There was a problem hiding this comment.
I've tested this only on Linux since I don't have macOS and Windows. From what I see, this PR should work on all platforms.
There is single new dependency for this plugin:
https://github.com/matwey/fluent-bit/blob/8d485c33e5eeab4b42489f56caeafb58bce3b166/CMakeLists.txt#L1337
rabbitmq-c is cross-platform library. Existing vcpkg port proves that it can be built for all platforms.
There was a problem hiding this comment.
We will need to include the dependency for all builds as part of this change then , it might be easier to disable as well for macos/windows for now - there are config files in cmake/ for this.
Linux builds are done using the containers under the packaging/ directory so we can add the dependencies there if required, or include a vendored version of it if we are building from source under lib/.
Here is initial implementation for input AMQP plugin. I guess it is stable enough to obtain some review. The plugin was tested with RabbitMQ 4.2.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
fluent-bit -i amqp -pqueue=flb.in_amqp -o stdoutDocumentation
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit