Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per‑VRF FIB sizing and management for IPv4 and IPv6: new public API types/requests and stream responses for FIB config set/list, control‑plane logic to create, resize (with migration), and destroy per‑VRF FIBs on demand, plus VRF removal cleanup. Extends CLI with commands to set and show per‑VRF FIB config and wires per‑protocol handlers. Adds addr6_delete for IPv6 address deletion, exposes tbl8 usage stats in DPDK, and includes smoke tests and tools to exercise FIB configuration and large‑scale route injection. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/route.c`:
- Around line 645-662: The code updates fib4_sizes[req->vrf_id] to the new size
before calling create_fib, so if create_fib (in route.c around the create_fib
call) fails and returns via api_out the fib4_sizes entry is left inconsistent
with the active FIB; fix by deferring the assignment or restoring the old value
on error: either move the fib4_sizes[req->vrf_id] = req->fib_size assignment to
after successful create_fib/new_fib setup, or if you must keep the assignment
earlier, ensure you reset fib4_sizes[req->vrf_id] = old_size before returning
(via api_out(errno, 0, NULL)) when create_fib returns NULL; reference
fib4_sizes, create_fib, old_fib, new_fib and the api_out return path.
- Around line 582-629: fib4_migrate_cb currently calls addr4_delete (via
nexthop_routes_cleanup → rib4_delete → rte_fib_delete) while the outer rib4_iter
is iterating, which can corrupt the RIB iterator; instead, collect addresses to
be deleted in the migration context and perform deletions after the iteration
completes. Concretely: in struct fib4_migrate_ctx add a small growable container
(e.g., an array or linked list) to record tuples (iface_id, ipv4, prefixlen)
when fib4_migrate_cb would have called addr4_delete; in fib4_migrate_cb (and
where you call nexthop_decref) push the address info into ctx->deletions and do
not alter the RIB or call addr4_delete there; after rib4_iter returns (the
caller that uses fib4_migrate_cb and owns struct fib4_migrate_ctx), iterate
ctx->deletions and call addr4_delete for each entry (and free the container),
preserving nexthop reference semantics and avoiding modifications during
rte_rib_get_nxt iteration.
In `@modules/ip6/control/route.c`:
- Around line 686-703: The code updates fib6_sizes[req->vrf_id] to req->fib_size
before attempting to allocate a new FIB and thus doesn't roll back on
create_fib6 failure; modify the control flow in the route resizing block so that
if create_fib6(req->vrf_id) returns NULL you restore fib6_sizes[req->vrf_id]
back to old_size (use the existing old_size variable) before returning via
api_out(errno, 0, NULL); ensure the rollback happens immediately after detecting
new_fib == NULL and before any returns or logging that assume the new size.
| static void fib4_migrate_cb( | ||
| uint16_t vrf_id, | ||
| ip4_addr_t ip, | ||
| uint8_t prefixlen, | ||
| gr_nh_origin_t origin, | ||
| const struct nexthop *nh, | ||
| void *priv | ||
| ) { | ||
| uint32_t host_ip = rte_be_to_cpu_32(ip); | ||
| struct fib4_migrate_ctx *ctx = priv; | ||
| const struct nexthop_info_l3 *addr; | ||
| int ret; | ||
|
|
||
| ret = rte_fib_add(ctx->new_fib, host_ip, prefixlen, nh_ptr_to_id(nh)); | ||
| if (ret < 0) { | ||
| if (nh->type == GR_NH_T_L3) { | ||
| addr = nexthop_info_l3(nh); | ||
| if (!(addr->flags & NH_LOCAL_ADDR_FLAGS)) | ||
| addr = NULL; | ||
| } else { | ||
| addr = NULL; | ||
| } | ||
| if (addr != NULL) { | ||
| LOG(WARNING, | ||
| "iface %u: dropping local address " IP4_F "/%hhu: %s", | ||
| nh->iface_id, | ||
| &ip, | ||
| prefixlen, | ||
| rte_strerror(-ret)); | ||
| addr4_delete(nh->iface_id, addr->ipv4, addr->prefixlen); | ||
| } else { | ||
| LOG(WARNING, | ||
| "vrf %u: dropping route " IP4_F "/%hhu: %s", | ||
| vrf_id, | ||
| &ip, | ||
| prefixlen, | ||
| rte_strerror(-ret)); | ||
| nexthop_decref((struct nexthop *)nh); | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| struct rte_rib *rib = rte_fib_get_rib(ctx->new_fib); | ||
| struct rte_rib_node *rn = rte_rib_lookup_exact(rib, host_ip, prefixlen); | ||
| gr_nh_origin_t *o = rte_rib_get_ext(rn); | ||
| *o = origin; | ||
| ctx->counts[origin]++; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let me find the file and understand the structure
git ls-files | grep -E "(route|address)\.c$" | head -10Repository: DPDK/grout
Length of output: 319
🏁 Script executed:
# Check addr4_delete implementation
rg -n -A 20 'addr4_delete' modules/ip/control/ --type c | head -100Repository: DPDK/grout
Length of output: 5848
🏁 Script executed:
# Check nexthop_routes_cleanup
rg -n -A 15 'nexthop_routes_cleanup' modules/ip/control/ --type cRepository: DPDK/grout
Length of output: 955
🏁 Script executed:
# Check rib4_cleanup and rib4_delete
rg -n -A 10 'void rib4_cleanup|void rib4_delete' modules/ip/control/ --type cRepository: DPDK/grout
Length of output: 2020
🏁 Script executed:
# Check fib4_conf_set and how it calls rib4_iter
rg -n -B 5 -A 30 'fib4_conf_set' modules/ip/control/route.c | grep -A 30 'rib4_iter'Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Find nexthop_routes_cleanup implementation
rg -n -A 20 'void nexthop_routes_cleanup' modules/ --type cRepository: DPDK/grout
Length of output: 3105
🏁 Script executed:
# Find rib4_delete implementation
rg -n -B 2 -A 15 'void rib4_delete|int rib4_delete' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 561
🏁 Script executed:
# Get full rib4_cleanup_nh implementation
rg -n -A 15 'static void rib4_cleanup_nh' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 468
🏁 Script executed:
# Find fib4_conf_set and the outer rib4_iter call
rg -n -A 50 'static struct api_out fib4_conf_set' modules/ip/control/route.c | head -100Repository: DPDK/grout
Length of output: 1657
🏁 Script executed:
# Find rib4_iter implementation
rg -n -B 2 -A 25 'void rib4_iter' modules/ip/control/route.c | grep -v "^--$"Repository: DPDK/grout
Length of output: 1042
🏁 Script executed:
# Also check what af_ops is set to for IPv4 to confirm cleanup_routes is rib4_cleanup
rg -n -B 5 -A 5 'af_ops\[.*\].*=' modules/infra/control/l3_nexthop.cRepository: DPDK/grout
Length of output: 489
🏁 Script executed:
# Check if there's any RCU synchronization or special handling in rib4_iter
rg -n -B 3 'rte_rib_get_nxt' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 200
🏁 Script executed:
# Get the full fib4_migrate_cb function context
rg -n -B 5 'static void fib4_migrate_cb' modules/ip/control/route.c | head -60Repository: DPDK/grout
Length of output: 213
🏁 Script executed:
# Check rib4_delete to confirm it modifies RIB
rg -n -A 30 'int rib4_delete' modules/ip/control/route.c | head -80Repository: DPDK/grout
Length of output: 977
🏁 Script executed:
# Let me verify the RIB structure and whether get_fib returns the same RIB
rg -n -A 10 'struct rte_fib \*get_fib' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 316
🏁 Script executed:
# Check how vrf_fibs is used and whether old_fib is the same during both iterations
rg -n -B 2 'vrf_fibs\[' modules/ip/control/route.c | head -40Repository: DPDK/grout
Length of output: 1079
🏁 Script executed:
# Check if there are any constraints or guarantees about rte_fib_add during migration
rg -n -B 5 'rte_fib_add.*new_fib' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 277
🏁 Script executed:
# Check what error conditions rte_fib_add can return
rg -n 'ret < 0' modules/ip/control/route.c | grep -A 2 'fib_add\|fib_migrate'Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Let me check if DPDK's rte_fib_delete has any guarantees about safety during iteration
rg -n 'RTE_RIB_GET_NXT_ALL' modules/ip/control/ --type cRepository: DPDK/grout
Length of output: 163
🏁 Script executed:
# Check if there are any RCU or synchronization mechanisms protecting against this
rg -n 'rte_rcu\|synchronize\|RCU' modules/ip/control/route.c | head -20Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Verify the exact scenario: look at when rib4_cleanup would be called during fib4_migrate_cb
cat -n modules/ip/control/route.c | sed -n '582,629p'Repository: DPDK/grout
Length of output: 1571
Nested RIB iteration during migration can corrupt iterator state.
When fib4_migrate_cb calls addr4_delete (line 611) for local addresses that fail to insert, the deletion path goes through nexthop_routes_cleanup → rib4_cleanup → rib4_iter(GR_VRF_ID_UNDEF, ...) → rib4_delete, which calls rte_fib_delete and modifies the RIB structure. This RIB is the same one being traversed by the outer rib4_iter(req->vrf_id, ...) call at line 665. Modifying the radix tree while rte_rib_get_nxt is iterating can invalidate the current iterator state (rn), leading to skipped entries or use-after-free.
Collect addresses that need deletion during the migration iteration, then delete them after rib4_iter completes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 582 - 629, fib4_migrate_cb currently
calls addr4_delete (via nexthop_routes_cleanup → rib4_delete → rte_fib_delete)
while the outer rib4_iter is iterating, which can corrupt the RIB iterator;
instead, collect addresses to be deleted in the migration context and perform
deletions after the iteration completes. Concretely: in struct fib4_migrate_ctx
add a small growable container (e.g., an array or linked list) to record tuples
(iface_id, ipv4, prefixlen) when fib4_migrate_cb would have called addr4_delete;
in fib4_migrate_cb (and where you call nexthop_decref) push the address info
into ctx->deletions and do not alter the RIB or call addr4_delete there; after
rib4_iter returns (the caller that uses fib4_migrate_cb and owns struct
fib4_migrate_ctx), iterate ctx->deletions and call addr4_delete for each entry
(and free the container), preserving nexthop reference semantics and avoiding
modifications during rte_rib_get_nxt iteration.
modules/ip6/control/route.c
Outdated
| if (req->fib_size == old_size && old_fib != NULL) | ||
| return api_out(0, 0, NULL); | ||
|
|
||
| fib6_sizes[req->vrf_id] = req->fib_size; | ||
|
|
||
| if (old_fib == NULL) | ||
| return api_out(0, 0, NULL); | ||
|
|
||
| LOG(INFO, | ||
| "resizing IPv6 FIB VRF %s(%u) %u -> %u", | ||
| vrf->name, | ||
| vrf->id, | ||
| old_size, | ||
| req->fib_size); | ||
|
|
||
| new_fib = create_fib6(req->vrf_id); | ||
| if (new_fib == NULL) | ||
| return api_out(errno, 0, NULL); |
There was a problem hiding this comment.
Same rollback gap as IPv4: fib6_sizes updated (line 689) before create_fib6 — not restored on failure (line 702–703).
Proposed fix: restore old size on failure
new_fib = create_fib6(req->vrf_id);
- if (new_fib == NULL)
+ if (new_fib == NULL) {
+ fib6_sizes[req->vrf_id] = old_size;
return api_out(errno, 0, NULL);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (req->fib_size == old_size && old_fib != NULL) | |
| return api_out(0, 0, NULL); | |
| fib6_sizes[req->vrf_id] = req->fib_size; | |
| if (old_fib == NULL) | |
| return api_out(0, 0, NULL); | |
| LOG(INFO, | |
| "resizing IPv6 FIB VRF %s(%u) %u -> %u", | |
| vrf->name, | |
| vrf->id, | |
| old_size, | |
| req->fib_size); | |
| new_fib = create_fib6(req->vrf_id); | |
| if (new_fib == NULL) | |
| return api_out(errno, 0, NULL); | |
| if (req->fib_size == old_size && old_fib != NULL) | |
| return api_out(0, 0, NULL); | |
| fib6_sizes[req->vrf_id] = req->fib_size; | |
| if (old_fib == NULL) | |
| return api_out(0, 0, NULL); | |
| LOG(INFO, | |
| "resizing IPv6 FIB VRF %s(%u) %u -> %u", | |
| vrf->name, | |
| vrf->id, | |
| old_size, | |
| req->fib_size); | |
| new_fib = create_fib6(req->vrf_id); | |
| if (new_fib == NULL) { | |
| fib6_sizes[req->vrf_id] = old_size; | |
| return api_out(errno, 0, NULL); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 686 - 703, The code updates
fib6_sizes[req->vrf_id] to req->fib_size before attempting to allocate a new FIB
and thus doesn't roll back on create_fib6 failure; modify the control flow in
the route resizing block so that if create_fib6(req->vrf_id) returns NULL you
restore fib6_sizes[req->vrf_id] back to old_size (use the existing old_size
variable) before returning via api_out(errno, 0, NULL); ensure the rollback
happens immediately after detecting new_fib == NULL and before any returns or
logging that assume the new size.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/route.c`:
- Around line 773-788: The iface_rm_cb frees the FIB without RCU synchronization
causing a potential use-after-free; update iface_rm_cb to mirror fib4_conf_set
by calling rte_rcu_qsbr_synchronize() after clearing vrf_fibs[iface->id] and
before calling rte_fib_free(fib) so in-flight datapath readers finish first
(apply the same pattern to the IPv6 counterpart in modules/ip6/control/route.c);
reference the iface_rm_cb function, the fib variable, vrf_fibs[],
rte_rcu_qsbr_synchronize, rte_fib_free, and route4_vrf_rm to locate and patch
the code.
In `@modules/ip6/control/route.c`:
- Around line 820-826: The iface removal callback iface_rm_cb clears
fib6_sizes[iface->id] and vrf_fibs[iface->id] but leaves route_counts[iface->id]
stale; update iface_rm_cb to zero the entire per-VRF entry in route_counts for
iface->id (clear all origins/counts) when tearing down the VRF so recycled IDs
don't inherit counts, ensuring rib6_delete's assert and rib6_metrics_collect see
accurate zeros for that VRF.
- Around line 44-56: The code multiplies size by 4 in fib6_make_conf which can
overflow for large fib_size; fix by bounding the value before multiplication and
by validating in fib6_conf_set so fib6_sizes[] isn't updated with a bad value.
In fib6_conf_set, check req->fib_size against UINT32_MAX/4 and reject or clamp
it before storing to fib6_sizes; in fib6_make_conf, perform the multiplication
using a checked/capped approach (e.g., if size > UINT32_MAX/4 set size =
UINT32_MAX/4) so .max_routes = size * 4 cannot overflow.
---
Duplicate comments:
In `@modules/ip/control/route.c`:
- Around line 654-668: The assignment to fib4_sizes[req->vrf_id] is done before
create_fib succeeds, leaving fib4_sizes inconsistent if create_fib fails; update
the code so you only set fib4_sizes[req->vrf_id] after create_fib returns a
non-NULL new_fib (or, alternatively, restore the old value before returning on
failure). Concretely, move the write to fib4_sizes (or add a rollback setting
fib4_sizes[req->vrf_id] = old_size) so that create_fib(req->vrf_id) and the NULL
check against new_fib happen before permanently updating fib4_sizes; reference
functions/variables: fib4_sizes, create_fib, new_fib, old_fib, api_out, vrf,
req->fib_size.
- Around line 584-635: The callback fib4_migrate_cb is modifying the same RIB
being iterated (via addr4_delete and nexthop_decref when rte_fib_add fails),
which can corrupt rte_rib_get_nxt traversal; instead, do not modify or free
routes/nexthops inside fib4_migrate_cb: record deletions/events in the migration
context (e.g., add entries to a ctx->pending_deletes list or increment a
counter) and return without calling addr4_delete or nexthop_decref; after
rib4_iter_one finishes and the swap to ctx->new_fib is complete, iterate
ctx->pending_deletes to call addr4_delete, gr_event_push and nexthop_decref as
needed. Ensure you reference fib4_migrate_cb, rte_fib_add, addr4_delete,
nexthop_decref, rib4_iter_one and ctx->new_fib when implementing the
deferred-delete approach.
In `@modules/ip6/control/route.c`:
- Around line 694-708: The code sets fib6_sizes[req->vrf_id] before attempting
to create a new FIB and does not restore it if create_fib6(req->vrf_id) fails;
update the resize path in the function containing the old_fib check so that if
create_fib6 returns NULL you restore fib6_sizes[req->vrf_id] back to old_size
(or the previous value) before returning via api_out(errno, 0, NULL). Ensure you
reference and restore the same fib6_sizes entry using req->vrf_id and keep the
existing early-return behavior when old_fib == NULL and when create_fib6 fails.
| static void iface_rm_cb(uint32_t /*ev_type*/, const void *obj) { | ||
| const struct iface *iface = obj; | ||
| struct rte_fib *fib; | ||
|
|
||
| if (iface->type != GR_IFACE_TYPE_VRF) | ||
| return; | ||
|
|
||
| fib4_sizes[iface->id] = 0; | ||
| fib = vrf_fibs[iface->id]; | ||
| vrf_fibs[iface->id] = NULL; | ||
| if (fib != NULL) { | ||
| LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", iface->name, iface->id); | ||
| rib4_iter_one(rte_fib_get_rib(fib), iface->id, route4_vrf_rm, (void *)iface); | ||
| rte_fib_free(fib); | ||
| } | ||
| } |
There was a problem hiding this comment.
iface_rm_cb frees the FIB without RCU synchronization — use-after-free in the datapath.
After vrf_fibs[iface->id] = NULL (line 782), a datapath worker may already hold the old fib pointer (read from get_fib before the NULL write) and still be executing inside rte_fib_lookup_bulk when rte_fib_free(fib) is called. fib4_conf_set correctly guards the same pattern with rte_rcu_qsbr_synchronize (line 674), but iface_rm_cb does not. The same omission is expected in the analogous iface_rm_cb in modules/ip6/control/route.c.
🔒 Proposed fix
fib4_sizes[iface->id] = 0;
fib = vrf_fibs[iface->id];
vrf_fibs[iface->id] = NULL;
if (fib != NULL) {
LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", iface->name, iface->id);
rib4_iter_one(rte_fib_get_rib(fib), iface->id, route4_vrf_rm, (void *)iface);
+ rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID);
rte_fib_free(fib);
}Apply the same fix to iface_rm_cb in modules/ip6/control/route.c.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 773 - 788, The iface_rm_cb frees the
FIB without RCU synchronization causing a potential use-after-free; update
iface_rm_cb to mirror fib4_conf_set by calling rte_rcu_qsbr_synchronize() after
clearing vrf_fibs[iface->id] and before calling rte_fib_free(fib) so in-flight
datapath readers finish first (apply the same pattern to the IPv6 counterpart in
modules/ip6/control/route.c); reference the iface_rm_cb function, the fib
variable, vrf_fibs[], rte_rcu_qsbr_synchronize, rte_fib_free, and route4_vrf_rm
to locate and patch the code.
modules/ip6/control/route.c
Outdated
| static struct rte_fib6_conf fib6_make_conf(uint16_t vrf_id) { | ||
| uint32_t size = fib6_size(vrf_id); | ||
| return (struct rte_fib6_conf) { | ||
| .type = RTE_FIB6_TRIE, | ||
| .default_nh = 0, | ||
| .max_routes = size * 4, | ||
| .rib_ext_sz = sizeof(gr_nh_origin_t), | ||
| .trie = { | ||
| .nh_sz = RTE_FIB6_TRIE_8B, | ||
| .num_tbl8 = size, | ||
| }, | ||
| }; | ||
| } |
There was a problem hiding this comment.
max_routes = size * 4 silently overflows for large fib_size values.
size is uint32_t; when req->fib_size > 1073741823 (UINT32_MAX / 4), size * 4 wraps to a small (possibly zero) value. A fib_size of 0x40000000 produces max_routes = 0, and rte_fib6_create will either reject it or create a zero-capacity FIB. This is compounded by the fact that fib6_sizes[req->vrf_id] is written before create_fib6 is called, so the bad value persists even if creation fails.
Add an upper-bound guard in fib6_conf_set:
🐛 Proposed fix
if (req->fib_size == 0)
return api_out(EINVAL, 0, NULL);
+
+ if (req->fib_size > UINT32_MAX / 4)
+ return api_out(ERANGE, 0, NULL);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 44 - 56, The code multiplies size
by 4 in fib6_make_conf which can overflow for large fib_size; fix by bounding
the value before multiplication and by validating in fib6_conf_set so
fib6_sizes[] isn't updated with a bad value. In fib6_conf_set, check
req->fib_size against UINT32_MAX/4 and reject or clamp it before storing to
fib6_sizes; in fib6_make_conf, perform the multiplication using a checked/capped
approach (e.g., if size > UINT32_MAX/4 set size = UINT32_MAX/4) so .max_routes =
size * 4 cannot overflow.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/route.c`:
- Around line 45-57: The multiplication size * 4 in fib4_make_conf can overflow
when fib4_size (size) is > UINT32_MAX/4; update fib4_conf_set to validate
req->fib_size (or the value returned by fib4_size) before calling fib4_make_conf
and either clamp to a safe maximum (UINT32_MAX/4) or return an error; ensure
fib4_make_conf uses a uint64_t intermediate or verified value to compute
max_routes without wraparound and set .max_routes from that checked value,
referencing fib4_make_conf, fib4_conf_set, and the .max_routes field.
---
Duplicate comments:
In `@modules/ip/control/route.c`:
- Around line 654-668: The code updates fib4_sizes[req->vrf_id] before calling
create_fib, so on create_fib failure the stored size is wrong; move the
assignment to after create_fib succeeds (i.e., call create_fib(req->vrf_id)
first, check for NULL and return api_out(errno, 0, NULL) on failure, then set
fib4_sizes[req->vrf_id] = req->fib_size and proceed), or alternatively restore
the old value on failure before returning; reference fib4_sizes, req->vrf_id,
create_fib and api_out when making the change.
- Around line 773-789: iface_rm_cb currently NULLs vrf_fibs[iface->id] then
immediately calls rte_fib_free causing a use-after-free for datapath workers
that may still hold the old pointer; change iface_rm_cb to mirror fib4_conf_set
by saving fib into a local, nulling vrf_fibs[iface->id], then call
rte_rcu_qsbr_synchronize() to wait for readers, and only after that call
rte_fib_free(fib); keep the existing rib4_iter_one/rte_fib_get_rib call ordering
intact and ensure route_counts[iface->id] is still memset after the free as
before.
- Around line 584-635: fib4_migrate_cb currently calls addr4_delete during RIB
iteration which can mutate the same RIB iterator (via nexthop_routes_cleanup →
rib4_cleanup → rib4_iter → rib4_delete) and corrupt rte_rib_get_nxt state;
instead, modify the fib4_migrate_ctx to hold a temporary list/array of addresses
to remove (e.g., stored as {iface_id, ipv4, prefixlen} entries), change
fib4_migrate_cb to push failed-local-addresses into that list rather than
calling addr4_delete immediately, and after the outer rib4_iter_one finishes
(where fib4_migrate_cb is invoked), iterate the collected list and call
addr4_delete for each entry; ensure the new list is allocated/cleared in the
migration setup/teardown and that reference counting (nexthop_decref) and
logging behavior remain unchanged.
In `@modules/ip6/control/route.c`:
- Around line 694-708: The code updates fib6_sizes[req->vrf_id] to req->fib_size
before attempting to allocate a new FIB with create_fib6, so if create_fib6
fails the stored size is inconsistent; fix by saving the previous size (e.g.,
old_size or a temp variable) before assigning fib6_sizes[req->vrf_id] and, if
create_fib6 returns NULL, restore fib6_sizes[req->vrf_id] back to the saved
value before returning via api_out(errno, 0, NULL), keeping the existing error
return path and references to req->vrf_id, create_fib6, new_fib, old_fib intact.
- Around line 44-56: The calculation max_routes = size * 4 in fib6_make_conf can
overflow because fib6_conf_set only checks req->fib_size == 0; add an
upper-bound guard in the fib6_conf_set handler to validate req->fib_size (or
computed size) against a safe maximum so size*4 cannot overflow (or clamp size
to that max) before calling fib6_make_conf, and/or add a defensive check inside
fib6_make_conf to verify size is within bounds and fail early if not; reference
the fib6_make_conf function, the max_routes field, fib6_conf_set handler, and
req->fib_size when implementing the validation.
- Around line 813-829: In iface_rm_cb: don’t free the IPv6 FIB (rte_fib6_free)
immediately after nulling vrf_fibs[iface->id] because datapath readers may still
access the old pointer; instead set vrf_fibs[iface->id] = NULL, call the RCU
synchronize primitive (rte_rcu_qsbr_synchronize or equivalent) to wait for
readers, then call rte_fib6_free on fib; keep the existing
rib6_iter_one/rte_fib6_get_rib/route6_vrf_rm cleanup but move the rte_fib6_free
until after synchronization and ensure any related arrays (fib6_sizes,
route_counts) are updated as before.
- Around line 625-675: fib6_migrate_cb is mutating the RIB during rib6_iter by
calling addr6_delete (via nexthop_routes_cleanup → rib6_cleanup → rib6_delete),
which can corrupt the outer iteration; instead, stop performing addr6_delete
inside fib6_migrate_cb and buffer deletions for post-iteration processing: add a
deletions list/array to struct fib6_migrate_ctx and, in fib6_migrate_cb, when
detecting a local address that needs removal (the branch that currently calls
addr6_delete), append {iface_id, ipv6, prefixlen} to ctx->deletes and still log
the warning; after the rib6_iter completes, iterate ctx->deletes and call
addr6_delete for each entry (preserving nexthop_decref behavior for non-local
branches as-is), ensuring no RIB-modifying calls
(addr6_delete/rib6_delete/rib6_cleanup) occur inside fib6_migrate_cb or other
callbacks used during rib6_iter.
modules/ip/control/route.c
Outdated
| static struct rte_fib_conf fib4_make_conf(uint16_t vrf_id) { | ||
| uint32_t size = fib4_size(vrf_id); | ||
| return (struct rte_fib_conf) { | ||
| .type = RTE_FIB_DIR24_8, | ||
| .default_nh = 0, | ||
| .max_routes = size * 4, | ||
| .rib_ext_sz = sizeof(gr_nh_origin_t), | ||
| .dir24_8 = { | ||
| .nh_sz = RTE_FIB_DIR24_8_8B, | ||
| .num_tbl8 = size, | ||
| }, | ||
| }; | ||
| } |
There was a problem hiding this comment.
max_routes = size * 4 silently overflows for large fib_size values.
size is uint32_t; when req->fib_size > UINT32_MAX / 4 (~1.07 billion), size * 4 wraps around. The CLI allows values up to UINT32_MAX. Add an upper-bound check in fib4_conf_set:
Proposed fix
if (req->fib_size == 0)
return api_out(EINVAL, 0, NULL);
+
+ if (req->fib_size > UINT32_MAX / 4)
+ return api_out(ERANGE, 0, NULL);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 45 - 57, The multiplication size * 4
in fib4_make_conf can overflow when fib4_size (size) is > UINT32_MAX/4; update
fib4_conf_set to validate req->fib_size (or the value returned by fib4_size)
before calling fib4_make_conf and either clamp to a safe maximum (UINT32_MAX/4)
or return an error; ensure fib4_make_conf uses a uint64_t intermediate or
verified value to compute max_routes without wraparound and set .max_routes from
that checked value, referencing fib4_make_conf, fib4_conf_set, and the
.max_routes field.
modules/infra/cli/route.c
Outdated
| struct libscols_table *table = scols_new_table(); | ||
| scols_table_new_column(table, "VRF", 0, 0); | ||
| scols_table_new_column(table, "AF", 0, 0); | ||
| scols_table_new_column(table, "FIB_SIZE", 0, SCOLS_FL_RIGHT); |
There was a problem hiding this comment.
Is FIB_SIZE the max number of routes we can inject into a specific VRF ?
In this CLI command (or in the route list), it would help to list the number of used slots as well.
There was a problem hiding this comment.
No it is the number of 8bit slots in the FIB. Depending on the prefix length distribution, it may allow more or less routes in total.
I don't think we can access the number of used slots in the rte_fib api.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
modules/ip6/control/route.c (2)
699-713:⚠️ Potential issue | 🟠 MajorRollback
fib6_sizesifcreate_fib6fails.If allocation fails,
fib6_sizesstays at the new value while the old FIB remains active, and subsequent resize attempts become no-ops. Restoreold_sizeon failure.🐛 Proposed fix
new_fib = create_fib6(req->vrf_id); - if (new_fib == NULL) - return api_out(errno, 0, NULL); + if (new_fib == NULL) { + fib6_sizes[req->vrf_id] = old_size; + return api_out(errno, 0, NULL); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip6/control/route.c` around lines 699 - 713, The code updates fib6_sizes[req->vrf_id] before creating the new FIB so if create_fib6(req->vrf_id) fails the fib6_sizes entry remains the new value while the old FIB is still active; modify the error path so that when create_fib6 returns NULL you restore fib6_sizes[req->vrf_id] = old_size before returning (i.e., undo the earlier assignment on failure), keeping the LOG and api_out(errno, 0, NULL) behavior otherwise; refer to fib6_sizes, req->vrf_id, old_size, create_fib6 and api_out in the fix.
44-55:⚠️ Potential issue | 🟠 MajorGuard against
max_routesoverflow for largefib_size.
size * 4can wrap whenfib_sizeexceedsUINT32_MAX / 4, yielding a tiny (or zero)max_routesand an undersized FIB. Add an upper-bound check (and clamp) before the multiply.🐛 Proposed fix
static struct rte_fib6_conf fib6_make_conf(uint16_t vrf_id) { uint32_t size = fib6_size(vrf_id); + if (size > UINT32_MAX / 4) + size = UINT32_MAX / 4; return (struct rte_fib6_conf) { .type = RTE_FIB6_TRIE, .default_nh = 0, .max_routes = size * 4, @@ if (req->fib_size == 0) return api_out(EINVAL, 0, NULL); + if (req->fib_size > UINT32_MAX / 4) + return api_out(ERANGE, 0, NULL);#!/bin/bash # Locate other max_routes derivations that might need the same guard. rg -n "max_routes\s*=\s*.*\*\s*4" -g'*.c'Also applies to: 691-693
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip6/control/route.c` around lines 44 - 55, The calculation for max_routes in fib6_make_conf can overflow when size * 4 exceeds UINT32_MAX; update fib6_make_conf to compute the product in a wider type (e.g., uint64_t tmp = (uint64_t)size * 4), clamp tmp to UINT32_MAX (or the max of the .max_routes field) and then assign the clamped uint32_t value to .max_routes; keep references to fib6_make_conf, fib6_size and the .max_routes field so you locate and replace the current size * 4 expression safely.modules/ip/control/route.c (4)
659-673:fib4_sizesnot restored oncreate_fibfailure — already flagged.Line 659 updates
fib4_sizes[req->vrf_id]beforecreate_fib(line 671). Ifcreate_fibfails, the stored size is inconsistent with the active FIB. This was raised in a previous review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip/control/route.c` around lines 659 - 673, The code updates fib4_sizes[req->vrf_id] before calling create_fib, so if create_fib(req->vrf_id) fails the recorded size becomes inconsistent; either move the assignment of fib4_sizes[req->vrf_id] to after create_fib returns successfully, or if you must set it earlier, restore fib4_sizes[req->vrf_id] to old_size when create_fib returns NULL (before returning api_out(errno,...)); update the function surrounding symbols (fib4_sizes, req->vrf_id, create_fib, api_out) accordingly so the stored FIB size always matches the active FIB on error paths.
773-789:iface_rm_cbmissing RCU synchronization beforerte_fib_free— already flagged.Datapath workers may still hold a pointer to the old FIB after
vrf_fibs[iface->id] = NULL(line 782). Withoutrte_rcu_qsbr_synchronizebeforerte_fib_free(fib)(line 786), this is a use-after-free. This was raised in a previous review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip/control/route.c` around lines 773 - 789, The iface_rm_cb callback clears vrf_fibs[iface->id] and immediately calls rte_fib_free(fib), risking use-after-free by datapath readers; add an RCU grace-period wait (call rte_rcu_qsbr_synchronize or the project's equivalent) after vrf_fibs[iface->id] = NULL and before rte_fib_free(fib) to ensure readers referencing rte_fib_get_rib(fib) have quiesced; keep the existing rib4_iter_one(..., route4_vrf_rm, ...) ordering but perform synchronization between nulling vrf_fibs and freeing the fib.
45-57:max_routes = size * 4overflow for largefib_size— already flagged.No upper-bound validation on
req->fib_size; values aboveUINT32_MAX / 4cause silent wraparound. This was raised in a previous review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip/control/route.c` around lines 45 - 57, The multiplication size * 4 in fib4_make_conf can overflow uint32_t for large fib4_size values; change the local variable to a 64-bit type (e.g., uint64_t s = fib4_size(vrf_id)), clamp s to UINT32_MAX / 4 if it exceeds that bound, then compute max_routes from the clamped value (max_routes = (uint32_t)(s * 4)); update fib4_make_conf to use this clamped value to avoid silent wraparound. Ensure you include <stdint.h> and use the symbol names fib4_make_conf and max_routes in the change so the location is clear.
589-632: Nested RIB modification during iteration infib4_migrate_cb— already flagged.
addr4_delete(line 611) can triggernexthop_routes_cleanup→rib4_cleanup→rib4_iter(GR_VRF_ID_UNDEF, ...)→rib4_delete, modifying the same RIB being iterated by the outerrib4_iter_onecall. This was raised in a previous review.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/route.c`:
- Around line 72-98: The FIB name can collide because create_fib() uses the
requested fib4_sizes[vrf_id] when formatting name before rte_fib_create(), so
add a monotonic, thread-safe counter to the name to guarantee uniqueness:
introduce a static per-process (or per-vrf) atomic counter (e.g., fib_name_seq
or similar) and include its current value in the snprintf format for the FIB
name used by create_fib(); increment the counter atomically before calling
rte_fib_create() and otherwise keep the rest of create_fib() logic (including
updating fib4_sizes[vrf_id] with the actual allocated size after successful
creation and freeing on error) unchanged so the name never collides even if DPDK
rounded the previous allocation.
---
Duplicate comments:
In `@modules/ip/control/route.c`:
- Around line 659-673: The code updates fib4_sizes[req->vrf_id] before calling
create_fib, so if create_fib(req->vrf_id) fails the recorded size becomes
inconsistent; either move the assignment of fib4_sizes[req->vrf_id] to after
create_fib returns successfully, or if you must set it earlier, restore
fib4_sizes[req->vrf_id] to old_size when create_fib returns NULL (before
returning api_out(errno,...)); update the function surrounding symbols
(fib4_sizes, req->vrf_id, create_fib, api_out) accordingly so the stored FIB
size always matches the active FIB on error paths.
- Around line 773-789: The iface_rm_cb callback clears vrf_fibs[iface->id] and
immediately calls rte_fib_free(fib), risking use-after-free by datapath readers;
add an RCU grace-period wait (call rte_rcu_qsbr_synchronize or the project's
equivalent) after vrf_fibs[iface->id] = NULL and before rte_fib_free(fib) to
ensure readers referencing rte_fib_get_rib(fib) have quiesced; keep the existing
rib4_iter_one(..., route4_vrf_rm, ...) ordering but perform synchronization
between nulling vrf_fibs and freeing the fib.
- Around line 45-57: The multiplication size * 4 in fib4_make_conf can overflow
uint32_t for large fib4_size values; change the local variable to a 64-bit type
(e.g., uint64_t s = fib4_size(vrf_id)), clamp s to UINT32_MAX / 4 if it exceeds
that bound, then compute max_routes from the clamped value (max_routes =
(uint32_t)(s * 4)); update fib4_make_conf to use this clamped value to avoid
silent wraparound. Ensure you include <stdint.h> and use the symbol names
fib4_make_conf and max_routes in the change so the location is clear.
In `@modules/ip6/control/route.c`:
- Around line 699-713: The code updates fib6_sizes[req->vrf_id] before creating
the new FIB so if create_fib6(req->vrf_id) fails the fib6_sizes entry remains
the new value while the old FIB is still active; modify the error path so that
when create_fib6 returns NULL you restore fib6_sizes[req->vrf_id] = old_size
before returning (i.e., undo the earlier assignment on failure), keeping the LOG
and api_out(errno, 0, NULL) behavior otherwise; refer to fib6_sizes,
req->vrf_id, old_size, create_fib6 and api_out in the fix.
- Around line 44-55: The calculation for max_routes in fib6_make_conf can
overflow when size * 4 exceeds UINT32_MAX; update fib6_make_conf to compute the
product in a wider type (e.g., uint64_t tmp = (uint64_t)size * 4), clamp tmp to
UINT32_MAX (or the max of the .max_routes field) and then assign the clamped
uint32_t value to .max_routes; keep references to fib6_make_conf, fib6_size and
the .max_routes field so you locate and replace the current size * 4 expression
safely.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
modules/infra/cli/gr_cli_l3.hmodules/infra/cli/route.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/address.cmodules/ip6/control/gr_ip6_control.hmodules/ip6/control/route.csmoke/config_vrf_test.shsubprojects/dpdk.wrapsubprojects/packagefiles/dpdk/fib-expose-tbl8-usage-statistics.patch
🚧 Files skipped from review as they are similar to previous changes (1)
- smoke/config_vrf_test.sh
| static struct rte_fib *create_fib(uint16_t vrf_id) { | ||
| struct rte_fib_conf conf = fib4_make_conf(vrf_id); | ||
| struct rte_fib *fib; | ||
| char name[64]; | ||
| int ret; | ||
|
|
||
| if (vrf_id >= GR_MAX_IFACES) | ||
| return errno_set_null(EOVERFLOW); | ||
| snprintf(name, sizeof(name), "fib4_vrf_%u_%u", vrf_id, conf.dir24_8.num_tbl8); | ||
| fib = rte_fib_create(name, SOCKET_ID_ANY, &conf); | ||
| if (fib == NULL) | ||
| return errno_set_null(rte_errno); | ||
|
|
||
| struct rte_fib_rcu_config rcu_config = { | ||
| .v = gr_datapath_rcu(), .mode = RTE_FIB_QSBR_MODE_SYNC | ||
| }; | ||
| ret = rte_fib_rcu_qsbr_add(fib, &rcu_config); | ||
| if (ret < 0) { | ||
| rte_fib_free(fib); | ||
| return errno_set_null(-ret); | ||
| } | ||
|
|
||
| // DPDK may round up num_tbl8, store the actual allocated size | ||
| uint32_t actual; | ||
| if (rte_fib_tbl8_get_stats(fib, NULL, &actual) == 0) | ||
| fib4_sizes[vrf_id] = actual; | ||
|
|
||
| return fib; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and examine the create_fib function and related code
fd -e c -e h | grep -E 'modules/ip.*route' | head -5Repository: DPDK/grout
Length of output: 187
🏁 Script executed:
# Get the create_fib function and surrounding context
grep -n "create_fib\|fib4_conf_set\|rte_fib_create" modules/ip/control/route.c | head -20Repository: DPDK/grout
Length of output: 579
🏁 Script executed:
# Read the create_fib function completely
sed -n '72,98p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 798
🏁 Script executed:
# Read the fib4_conf_set function to understand the FIB creation and deletion order
sed -n '642,695p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 1581
🏁 Script executed:
# Also check what fib4_make_conf does and the conf.dir24_8.num_tbl8 value
sed -n '40,70p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 731
🏁 Script executed:
# Verify the exact flow: check if fib4_sizes is indeed overwritten before create_fib is called
sed -n '656,681p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 723
🏁 Script executed:
# Check if rte_fib_create enforces unique names (by looking for error handling)
rg -A 3 'rte_fib_create' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 159
🏁 Script executed:
# Look for any comments or documentation about DPDK FIB name uniqueness
rg -i 'unique|eexist|EEXIST|name collision|naming' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 120
🏁 Script executed:
# Check the IPv6 route.c to see if it has similar logic or if it handles this differently
grep -n 'snprintf.*fib6_vrf\|create_fib6' modules/ip6/control/route.c | head -10Repository: DPDK/grout
Length of output: 386
🏁 Script executed:
# Check what FIB4_SIZE_DEFAULT is and understand initialization
grep -n 'FIB4_SIZE_DEFAULT\|fib4_sizes\[' modules/ip/control/route.c | head -15Repository: DPDK/grout
Length of output: 318
🏁 Script executed:
# Search for DPDK RTE_FIB documentation or error handling related to name uniqueness
rg -i 'rte_fib.*name\|duplicate.*name' --type c | head -10Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check if there are any comments in the code about FIB name generation or DPDK behavior
sed -n '1,100p' modules/ip/control/route.c | grep -i 'dpdk\|round\|actual\|allocat'Repository: DPDK/grout
Length of output: 220
🏁 Script executed:
# Verify the initialization of fib4_sizes array
grep -n 'static.*fib4_sizes\|fib4_sizes\s*=' modules/ip/control/route.c | head -5Repository: DPDK/grout
Length of output: 99
🏁 Script executed:
# Verify the exact scenario: trace through the size setting and FIB creation flow
# Check if fib4_sizes is restored on create_fib failure
sed -n '659,673p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 365
🏁 Script executed:
# Look for any DPDK documentation or usage patterns about FIB name uniqueness
rg -i 'rte_fib_create.*eexist|rte_rib.*eexist' --type cRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check the DPDK header or any constants that indicate DPDK name enforcement
find . -name '*.h' -exec grep -l 'rte_fib' {} \; 2>/dev/null | head -3Repository: DPDK/grout
Length of output: 130
🏁 Script executed:
# Let's verify the initialization state - is fib4_sizes zeroed initially?
grep -B 2 -A 2 'static uint32_t fib4_sizes' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 182
🏁 Script executed:
# Check if there are any tests or examples showing FIB creation behavior with duplicate names
find . -name '*.c' -o -name '*.h' | xargs grep -l 'rte_fib_create' | grep -i testRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Look at the actual error handling when create_fib fails to understand if name collision is handled
sed -n '671,676p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 291
🏁 Script executed:
# Check DPDK version or any documentation in the repo about FIB name uniqueness
find . -name 'README*' -o -name 'DPDK*' -o -name 'VERSION*' | head -5Repository: DPDK/grout
Length of output: 65
🏁 Script executed:
# Verify if fib4_sizes is restored when create_fib fails (looking at the complete error path)
sed -n '671,680p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 420
🏁 Script executed:
# Check if there's any error recovery or special handling for EEXIST in create_fib
sed -n '72,98p' modules/ip/control/route.c | grep -A 5 'rte_fib_create'Repository: DPDK/grout
Length of output: 258
🏁 Script executed:
# Look for DPDK version info or any related comments
grep -r 'DPDK\|rte_' README.md | head -10Repository: DPDK/grout
Length of output: 650
🌐 Web query:
DPDK rte_fib_create name uniqueness enforcement globally unique names
💡 Result:
rte_fib_create(name, ...) enforces unique FIB names within the DPDK EAL “instance” (shared memconfig / tailq namespace).
- In
rte_fib_create(), DPDK takes the memconfig tailq write lock, walks the globalRTE_FIBtailq, and if an existingstruct rte_fibhas the samename, it fails creation, setsrte_errno = EEXIST, and returnsNULL. The code comment is explicit: “/* guarantee there’s no existing */”. [1] - The created FIB is then inserted into that same global
RTE_FIBtailq (TAILQ_INSERT_TAIL(...)). [1] - A secondary process is expected to use
rte_fib_find_existing(name)to retrieve the already-created object by name (it scans the sameRTE_FIBtailq; on miss it setsrte_errno = ENOENT). [1] - Name length is bounded by
RTE_FIB_NAMESIZE(64). [2]
Scope note: this “global uniqueness” is per EAL instance (i.e., processes sharing the same DPDK shared memory configuration). Separate, independent EAL instances (e.g., different --file-prefix / separate shared-memory namespaces) can each have their own RTE_FIB tailq and therefore can reuse the same FIB names without conflict. [1]
Sources:
[1] DPDK lib/fib/rte_fib.c (tailq scan + EEXIST, insertion, and rte_fib_find_existing) https://raw.githubusercontent.com/DPDK/dpdk/main/lib/fib/rte_fib.c
[2] DPDK rte_fib.h (RTE_FIB_NAMESIZE) https://doc.dpdk.org/api/rte__fib_8h_source.html
FIB name collision when re-requesting a pre-rounded size.
create_fib builds the FIB name from the requested num_tbl8 (via fib4_size() → fib4_sizes[vrf_id]), but line 95 overwrites fib4_sizes[vrf_id] with the actual DPDK-allocated (possibly rounded-up) value. When a user sets a size that DPDK rounds (e.g., 1000 → 1024), then requests 1000 again:
- Line 659 sets
fib4_sizes[vrf_id] = 1000(requested). create_fib()attempts namefib4_vrf_V_1000— but the old FIB (still alive, freed only at line 680) already owns that name in DPDK's global FIB tailq.rte_fib_create()returns NULL withEEXIST.
A monotonic counter in the name ensures no two live FIBs collide.
Proposed fix
static struct rte_fib *create_fib(uint16_t vrf_id) {
struct rte_fib_conf conf = fib4_make_conf(vrf_id);
+ static _Atomic uint32_t seq;
struct rte_fib *fib;
char name[64];
int ret;
- snprintf(name, sizeof(name), "fib4_vrf_%u_%u", vrf_id, conf.dir24_8.num_tbl8);
+ snprintf(name, sizeof(name), "fib4_vrf_%u_%u", vrf_id, atomic_fetch_add(&seq, 1));
fib = rte_fib_create(name, SOCKET_ID_ANY, &conf);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 72 - 98, The FIB name can collide
because create_fib() uses the requested fib4_sizes[vrf_id] when formatting name
before rte_fib_create(), so add a monotonic, thread-safe counter to the name to
guarantee uniqueness: introduce a static per-process (or per-vrf) atomic counter
(e.g., fib_name_seq or similar) and include its current value in the snprintf
format for the FIB name used by create_fib(); increment the counter atomically
before calling rte_fib_create() and otherwise keep the rest of create_fib()
logic (including updating fib4_sizes[vrf_id] with the actual allocated size
after successful creation and freeing on error) unchanged so the name never
collides even if DPDK rounded the previous allocation.
Both dir24_8 (IPv4) and trie (IPv6) algorithms already track tbl8 group allocation internally but DPDK does not expose this to applications. Monitoring tbl8 usage is needed to detect exhaustion before route insertions start failing. Add a local patch introducing rte_fib_tbl8_get_stats() and rte_fib6_tbl8_get_stats() which return the number of tbl8 groups currently in use and the total capacity. For dir24_8 the used count comes from cur_tbl8s, for trie from tbl8_pool_pos. This patch has been submitted upstream to DPDK. Link: https://patches.dpdk.org/project/dpdk/patch/20260224133340.1638295-2-rjarry@redhat.com/ Signed-off-by: Robin Jarry <rjarry@redhat.com>
When a VRF interface is deleted, the associated FIB remains allocated in vrf_fibs[] and is only cleaned up at shutdown in route_fini. This leaks DPDK memory for the lifetime of the daemon when VRFs are dynamically created and destroyed. Subscribe to GR_EVENT_IFACE_REMOVE, dereference nexthops for all remaining route entries and free the FIB as soon as the VRF interface is removed. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Move parse_uint() from main/main.c to api/string.c so that other components can reuse it without duplicating the strtoul wrapper. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Replace the hardcoded FIB size with per-VRF configuration stored in a flat array. The user-facing parameter is max_routes (number of routes). The internal num_tbl8 is auto-derived: for IPv4 DIR24_8, real-world BGP tables have less than 0.1% of prefixes longer than /24, so a ratio of max_routes/500 with a floor of 256 provides ample headroom. An optional num_tbl8 override is available for advanced use. Add GR_IP4_FIB_CONF_SET and GR_IP4_FIB_INFO_LIST API request types. The info response includes max_routes, used_routes and tbl8 statistics. When the config changes for a VRF with an active FIB, create a new FIB, migrate routes, swap the pointer, RCU sync and free the old one. Routes that fail to insert into a smaller FIB are dropped with a warning and associated local addresses are deleted to keep state consistent. To avoid conflicts during the resize window, include a sequence number in the FIB object name. Expose rib4_max_routes, fib4_max_tbl8 and fib4_used_tbl8 as new metrics. The default routing tables can now become bigger. To ensure we have enough memory to run smoke tests, expose an internal environment variable to change the default size of routing tables. Export GROUT_MAX_ROUTES_DEFAULT=128 in smoke tests. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Same approach as IPv4. The auto-derived num_tbl8 uses a ratio of max_routes*4 since IPv6 trie routes consume up to 3 tbl8 groups each with moderate sharing between prefixes. Add GR_IP6_FIB_CONF_SET and GR_IP6_FIB_INFO_LIST API request types. Live resize migrates routes via rib6_iter and deletes local addresses that fail to fit in the downsized FIB. Expose rib6_max_routes, fib6_max_tbl8 and fib6_used_tbl8 as new metrics. Reduce the default size of IPv6 routing tables to allow smoke tests to pass with the default test mode memory (2GB). Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add "route config set" and "route config show" CLI commands. The set command accepts rib4-routes, fib4-tbl8, rib6-routes and fib6-tbl8 per VRF. The show command displays routes and tbl8 usage as "used/max (pct%)". Extend cli_route_ops with optional config_set and config_show callbacks so that IPv4 and IPv6 modules register their own handlers. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Exercise the route config CLI commands in config_vrf_test: set custom rib4-routes and rib6-routes values, add addresses and verify route counts, then resize down and check that excess routes and addresses are dropped. Signed-off-by: Robin Jarry <rjarry@redhat.com>
In test mode, grout passes -m with a default memory size to EAL. If the user also passes -m via extra EAL args (e.g. grout -t -- -m 4096), EAL rejects the duplicate flag. Skip the default -m when the user has already provided one so that tests requiring more memory (like FIB full view injection) can override it. Signed-off-by: Robin Jarry <rjarry@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
modules/ip/control/route.c (1)
811-827:⚠️ Potential issue | 🔴 CriticalMissing RCU synchronization before freeing FIB.
This is the same issue flagged in previous review. After
vrf_fibs[iface->id] = NULL, datapath workers may still be executing lookups on the oldfib. The FIB is freed without waiting for quiescent state.Proposed fix
if (fib != NULL) { LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", iface->name, iface->id); rib4_iter_one(rte_fib_get_rib(fib), iface->id, route4_vrf_rm, (void *)iface); + rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); rte_fib_free(fib); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip/control/route.c` around lines 811 - 827, In iface_rm_cb, after clearing vrf_fibs[iface->id] but before calling rte_fib_free(fib), wait for RCU readers to finish so datapath lookups won't use the freed FIB; insert an RCU grace-period synchronization (e.g., call rte_rcu_synchronize() or synchronize_rcu() as appropriate) between vrf_fibs[iface->id] = NULL and rte_fib_free(fib), and add any necessary include(s) so the symbol is available; keep the rest of the logic (rib4_iter_one/rte_fib_get_rib/route4_vrf_rm) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip6/control/route.c`:
- Around line 852-868: The callback iface_rm_cb frees a FIB while leaving a
window where datapath workers may still reference the old pointer; after nulling
vrf_fibs[iface->id] and before calling rte_fib6_free(fib) you must perform RCU
synchronization (call rte_rcu_qsbr_synchronize or the project's equivalent) to
wait for ongoing lookups to finish; update iface_rm_cb to call
rte_rcu_qsbr_synchronize() (or the same synchronization used in fib6_conf_set)
between setting vrf_fibs[iface->id] = NULL and rte_fib6_free(fib) to prevent
use-after-free.
- Around line 53-56: fib6_auto_tbl8 multiplies max_routes by 4 and can overflow
for large max_routes; update fib6_conf_set to validate and reject values >
UINT32_MAX/4 (or otherwise out-of-range) before calling fib6_auto_tbl8,
returning an error and avoiding configuration mutation; reference the
fib6_auto_tbl8(max_routes) helper and the fib6_conf_set(config, max_routes)
entry point so the check is performed centrally and prevents wraparound when
computing n = max_routes * 4.
---
Duplicate comments:
In `@modules/ip/control/route.c`:
- Around line 811-827: In iface_rm_cb, after clearing vrf_fibs[iface->id] but
before calling rte_fib_free(fib), wait for RCU readers to finish so datapath
lookups won't use the freed FIB; insert an RCU grace-period synchronization
(e.g., call rte_rcu_synchronize() or synchronize_rcu() as appropriate) between
vrf_fibs[iface->id] = NULL and rte_fib_free(fib), and add any necessary
include(s) so the symbol is available; keep the rest of the logic
(rib4_iter_one/rte_fib_get_rib/route4_vrf_rm) unchanged.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
api/gr_string.hapi/string.cmain/dpdk.cmain/main.cmeson.buildmodules/infra/cli/gr_cli_l3.hmodules/infra/cli/route.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/address.cmodules/ip6/control/gr_ip6_control.hmodules/ip6/control/route.csmoke/_init.shsmoke/config_vrf_test.shsmoke/fib4_fullview_manualtest.shsmoke/fib6_fullview_manualtest.shsmoke/fib_inject.csubprojects/dpdk.wrapsubprojects/packagefiles/dpdk/fib-expose-tbl8-usage-statistics.patch
🚧 Files skipped from review as they are similar to previous changes (4)
- modules/ip/api/gr_ip4.h
- modules/ip6/api/gr_ip6.h
- modules/infra/cli/gr_cli_l3.h
- modules/infra/cli/route.c
| static inline uint32_t fib6_auto_tbl8(uint32_t max_routes) { | ||
| uint32_t n = max_routes * 4; | ||
| return n < 256 ? 256 : n; | ||
| } |
There was a problem hiding this comment.
max_routes * 4 overflows for large values.
When max_routes > UINT32_MAX / 4 (~1.07 billion), fib6_auto_tbl8 wraps around. fib6_conf_set should reject such values upfront.
Proposed fix in fib6_conf_set
+ if (req->max_routes > UINT32_MAX / 4)
+ return api_out(ERANGE, 0, NULL);
+
old_max = fib6_get_max_routes(req->vrf_id);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 53 - 56, fib6_auto_tbl8 multiplies
max_routes by 4 and can overflow for large max_routes; update fib6_conf_set to
validate and reject values > UINT32_MAX/4 (or otherwise out-of-range) before
calling fib6_auto_tbl8, returning an error and avoiding configuration mutation;
reference the fib6_auto_tbl8(max_routes) helper and the fib6_conf_set(config,
max_routes) entry point so the check is performed centrally and prevents
wraparound when computing n = max_routes * 4.
| static void iface_rm_cb(uint32_t /*ev_type*/, const void *obj) { | ||
| const struct iface *iface = obj; | ||
| struct rte_fib6 *fib; | ||
|
|
||
| if (iface->type != GR_IFACE_TYPE_VRF) | ||
| return; | ||
|
|
||
| memset(&fib6_conf[iface->id], 0, sizeof(fib6_conf[0])); | ||
| fib = vrf_fibs[iface->id]; | ||
| vrf_fibs[iface->id] = NULL; | ||
| if (fib != NULL) { | ||
| LOG(INFO, "destroying IPv6 FIB for VRF %s(%u)", iface->name, iface->id); | ||
| rib6_iter_one(rte_fib6_get_rib(fib), iface->id, route6_vrf_rm, (void *)iface); | ||
| rte_fib6_free(fib); | ||
| } | ||
| memset(route_counts[iface->id], 0, sizeof(route_counts[iface->id])); | ||
| } |
There was a problem hiding this comment.
Missing RCU synchronization before freeing FIB — potential use-after-free.
After vrf_fibs[iface->id] = NULL (line 861), datapath workers may still hold the old fib pointer mid-lookup. fib6_conf_set (line 754) correctly calls rte_rcu_qsbr_synchronize before rte_fib6_free, but iface_rm_cb does not.
Proposed fix
if (fib != NULL) {
LOG(INFO, "destroying IPv6 FIB for VRF %s(%u)", iface->name, iface->id);
rib6_iter_one(rte_fib6_get_rib(fib), iface->id, route6_vrf_rm, (void *)iface);
+ rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID);
rte_fib6_free(fib);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 852 - 868, The callback iface_rm_cb
frees a FIB while leaving a window where datapath workers may still reference
the old pointer; after nulling vrf_fibs[iface->id] and before calling
rte_fib6_free(fib) you must perform RCU synchronization (call
rte_rcu_qsbr_synchronize or the project's equivalent) to wait for ongoing
lookups to finish; update iface_rm_cb to call rte_rcu_qsbr_synchronize() (or the
same synchronization used in fib6_conf_set) between setting vrf_fibs[iface->id]
= NULL and rte_fib6_free(fib) to prevent use-after-free.
Add fib_inject, a C client that injects routes directly via the grout
API with a realistic prefix length distribution modeled after the
global BGP table:
IPv4: 62% /24, 11% /22, 11% /23, and smaller shares of /16-/21
with ~1% /32 host routes.
IPv6: 50% /48, 11% /44, 10% /40, 13% /32, and smaller shares of
/36-/46 with ~1% /128 host routes.
It creates 2048 distinct blackhole nexthops and distributes routes
across them round-robin so that DPDK does not recycle tbl8 groups
that would otherwise become homogeneous with a single nexthop value.
Add manual tests for IPv4 (~1M prefixes) and IPv6 (~200K prefixes)
full view injection with extra EAL memory. These are not run by CI.
Link: https://bgp.potaroo.net/as2.0/bgp-active.html
Link: https://bgp.potaroo.net/v6/as2.0/index.html
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Switch FIB RCU mode from QSBR_MODE_SYNC to QSBR_MODE_DQ. In sync mode, every tbl8 group free blocks the control plane until all datapath workers pass through a quiescent state. The IPv6 trie triggers frequent tbl8 allocation and recycling during route insertion, causing the control plane to stall on rte_rcu_qsbr_synchronize after nearly every route add. Profiling a 200K IPv6 route injection showed 96% of control plane time spent in rte_rcu_qsbr_synchronize. Switching to deferred queue mode pushes freed tbl8 indices to a lock-free ring that is lazily reclaimed when new tbl8 groups are needed. This brings IPv6 route insertion from 1.3K routes/s to 77K routes/s (~57x improvement). IPv4 is largely unaffected since DIR24_8 rarely recycles tbl8 groups. The deferred queue uses slightly more memory (one ring of num_tbl8 entries) and freed tbl8 groups remain occupied until reclaimed, but the tbl8 pool is already sized for the worst case so this is not a concern. Signed-off-by: Robin Jarry <rjarry@redhat.com>
DPDK rte_fib/rte_fib6 require fixed
max_routesandnum_tbl8at creation time and cannot be resized in-place. Currently grout hardcodes both to 65536 and 16384 respectively.This series adds
route config setandroute config showCLI commands to configure FIB capacity per VRF with independent IPv4 and IPv6 values. The user-facing parameter is the maximum number of routes (rib4-routes/rib6-routes). The internalnum_tbl8is auto-derived from real-world BGP prefix length distributions: 1/500 for IPv4 (less than 0.1% of prefixes are longer than /24) and 4x for IPv6 (trie routes consume up to 3 tbl8 groups with sharing). An optionalfib4-tbl8/fib6-tbl8override is available for advanced use.Live resizing creates a new FIB, migrates all routes, swaps the pointer, performs an RCU synchronization, and frees the old FIB. When downsizing, routes that fail to insert are dropped with a warning and associated local addresses are deleted to keep state consistent. Route and tbl8 usage are exposed via openmetrics and
route config show.A
fib_injecttool is included for full view injection testing with realistic prefix length distributions based on CIDR report data (~1M IPv4 / ~200K IPv6 prefixes).