Conversation
Introduce the VXLAN interface type for the L2 module. A VXLAN interface carries a VNI (VXLAN Network Identifier), a local VTEP address used as the outer IP source, an encapsulation VRF for underlay routing, and a configurable UDP destination port (default 4789). VXLAN interfaces are keyed by (VNI, encap_vrf_id) in a lockfree RCU-protected hash table so that the datapath can resolve incoming tunneled packets to the correct interface without locks. VXLAN interfaces are intended to be attached to a bridge domain. All L2 traffic entering the bridge is forwarded transparently over the VXLAN tunnel. The local VTEP address must already be configured in the encapsulation VRF. Signed-off-by: Robin Jarry <rjarry@redhat.com>
VXLAN uses UDP port 4789 by default but allows configuring a custom destination port per interface. Allow the control plane to register additional UDP ports at runtime as aliases for an already registered port, reusing the same datapath edge. Use reference counting so that multiple interfaces sharing the same non-default port do not interfere with each other during teardown. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Wire up the VXLAN interface's configurable destination port to the L4 input node. When a non-default port is configured, register it as an alias for the standard VXLAN port (4789) so that the datapath delivers matching UDP packets to the vxlan_input node. Unregister the alias when the port changes or the interface is destroyed. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce a transport-agnostic flood list framework for BUM traffic (Broadcast, Unknown unicast, Multicast). In EVPN, each PE maintains a flooding list built from IMET routes (RFC 8365, RFC 9572). The entries in this list differ depending on the overlay encapsulation: VXLAN uses a remote VTEP IPv4 address and a VNI, while SRv6 would use a 128-bit SID. The API defines a gr_flood_entry structure with a type discriminant and a union, allowing future encapsulation types (e.g. SRv6 SIDs) to be added without changing the API request types. A dispatch layer in control/flood.c routes add/del/list operations to type-specific callbacks registered at init time. Implement the VXLAN VTEP flood type (GR_FLOOD_T_VTEP). Each VXLAN interface maintains a per-VNI array of remote VTEP addresses used by the vxlan_flood datapath node for ingress replication. The array is replaced atomically with an RCU synchronization barrier so that the datapath never sees a partially updated list. CLI commands are exposed under "flood vtep add/del/show". Add new generated grcli-flood(1) man page. Signed-off-by: Robin Jarry <rjarry@redhat.com>
In a VXLAN overlay, the bridge needs to know which remote VTEP to use when sending unicast frames to a learned MAC address. Add a VTEP IPv4 address field to FDB entries so that known unicast traffic can be sent directly to the correct tunnel endpoint instead of being flooded to all VTEPs. When bridge_input learns a MAC address from a VXLAN member interface, it records the source VTEP from the decapsulated packet's outer IP header. When forwarding to a known destination, the stored VTEP address is passed to the output path via the mbuf private data so that vxlan_output can build the correct outer header. Only set the VTEP field when the source interface is actually a VXLAN type to avoid storing uninitialized data from other packet paths (control plane, local bridge traffic). Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add three datapath nodes for VXLAN packet processing. vxlan_input decapsulates incoming UDP/4789 packets. It strips the outer UDP and VXLAN headers, resolves the inner VNI to a VXLAN interface via the RCU-protected hash table, records the source VTEP from the outer IP header into the mbuf private data, and forwards the inner Ethernet frame to iface_input for bridge processing. vxlan_output encapsulates outgoing frames for a known destination VTEP. It prepends a pre-built IP/UDP/VXLAN header template initialized by the control plane, fills in the per-packet fields (destination VTEP, UDP length, IP length, checksum), and hashes the inner flow to select an ephemeral source port for underlay ECMP (RFC 7348 Section 5). The FIB lookup for the outer IP uses the encapsulation VRF, not the bridge domain. vxlan_flood handles BUM traffic by replicating the frame to every VTEP in the flood list via ingress replication. The original mbuf is sent to the first VTEP and clones are created for the rest. The bridge_flood node is updated to steer VXLAN member traffic through vxlan_flood instead of direct iface_output. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a VXLAN overlay between grout and a Linux netns peer. Grout runs a bridge with a VXLAN member (VNI 100) and the Linux side mirrors the topology with a kernel VXLAN device enslaved to a Linux bridge. Both sides have flood lists configured with each other's VTEP address for BUM traffic replication. The test verifies L3 connectivity over the tunnel by having the Linux side ping the bridge address. This exercises the full path: ARP resolution over VXLAN, FDB learning from decapsulated traffic, and ICMP echo reply via the VXLAN output encapsulation. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report bridge interfaces to FRR as ZEBRA_IF_BRIDGE with their MAC address. Tag members with ZEBRA_IF_SLAVE_BRIDGE and propagate the bridge ifindex so that FRR can associate them with the correct master. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report VXLAN interfaces to FRR's zebra as ZEBRA_IF_VXLAN with the associated L2 VNI information. This allows FRR's EVPN control plane to discover which VNIs are locally configured and advertise them via BGP IMET routes to remote PEs. The VXLAN L2 info includes the VNI, the local VTEP address, and the underlay interface index so that zebra can correlate the tunnel with the correct underlay routing context. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Synchronize bridge FDB entries bidirectionally between grout and FRR. This is required for EVPN to advertise locally learned MAC addresses via BGP type-2 routes and to install remotely learned MACs into the bridge forwarding table. Zebra's dplane API is asymmetric for MAC/FDB entries. In the downward direction (zebra to dplane provider), zebra uses DPLANE_OP_MAC_INSTALL and DPLANE_OP_MAC_DELETE to push MACs into the dataplane. In the upward direction (dplane provider notifying zebra of learned MACs), DPLANE_OP_NEIGH_INSTALL and DPLANE_OP_NEIGH_DELETE must be used instead. These go through zebra_neigh_macfdb_update() which calls zebra_vxlan_local_mac_add_update() and ultimately triggers BGP EVPN type-2 route advertisement. By contrast, the DPLANE_OP_MAC_* result handler (zebra_vxlan_handle_result) is a no-op. Despite the NEIGH op name, the context payload uses the macinfo union member and is populated with dplane_ctx_mac_set_*() accessors, exactly like zebra's own netlink provider does in netlink_macfdb_change(). Unlike routes and nexthops which use higher-level zebra APIs that resolve the namespace from the VRF ID, the FDB notification path looks up interfaces via if_lookup_by_index_per_ns(ns_id, ifindex). GROUT_NS must therefore be set on the dplane context for the interface lookup to succeed. Function names follow zebra's rt_netlink.c naming conventions: grout_macfdb_change() for the upward notification path (like netlink_macfdb_change) and grout_macfdb_update_ctx() for the downward install path (like netlink_macfdb_update_ctx). Self-event suppression is enabled on the FDB event subscriptions to prevent feedback loops when FRR installs a MAC that was originally learned by grout. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Handle DPLANE_OP_VTEP_ADD and DPLANE_OP_VTEP_DELETE operations from FRR's EVPN control plane. When BGP learns a remote VTEP via an IMET route (EVPN type-3), zebra pushes the VTEP to the dataplane provider. The grout_vxlan_flood_update_ctx() function (named after zebra's netlink_vxlan_flood_update_ctx() in rt_netlink.c) translates these operations into GR_FLOOD_ADD/DEL requests with GR_FLOOD_T_VTEP type. This is a downward-only path: zebra pushes flood list entries to the dplane provider. There is no upward notification for VTEP flood list changes since grout does not learn VTEPs on its own, they are always provided by FRR's BGP EVPN control plane. This allows BGP EVPN to dynamically manage the per-VNI flood lists used for BUM traffic ingress replication, replacing the need for static flood list configuration via the CLI. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a full EVPN/VXLAN topology between FRR+grout and a standalone FRR+Linux peer. Each side runs a bridge with a VXLAN member (VNI 100) and a host namespace. Both peers run iBGP with the l2vpn evpn address-family and advertise-all-vni. The test verifies that EVPN type-3 (IMET) routes are exchanged so that both sides install each other's VTEP in their flood lists. It then verifies end-to-end L2 connectivity by pinging between the two host namespaces through the VXLAN overlay, which exercises type-2 (MAC/IP) route advertisement and FDB synchronization. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Expose per-bridge forwarding, drop and learning counters aggregated across all datapath cores. Each counter category is maintained in per-core arrays to avoid cache contention in the datapath, then summed on demand when the control plane queries them. The bridge_stats structure tracks unicast, broadcast, multicast and flood forwarding, various drop reasons (no FDB entry, hairpin, interface down), and learning outcomes (success, update, failure, limit exceeded). The fdb_stats structure tracks lookup hits, misses and aged entries. Statistics are cleared when a bridge is destroyed. A reset API allows clearing counters without destroying the bridge. CLI commands are exposed under "bridge stats show/reset". Add the "bridge" CLI command group to generated man pages. Signed-off-by: Fabien Dupont <fdupont@redhat.com>
📝 WalkthroughWalkthroughThis pull request introduces comprehensive VXLAN (Virtual Extensible LAN) support across the control plane and datapath. Changes include a new GR_IFACE_TYPE_VXLAN interface type in the infrastructure API, VXLAN interface configuration and management in the L2 control module, VXLAN flood VTEP registration and management, and FDB extensions to track remote VTEPs for learned MAC addresses. The datapath receives three new nodes for VXLAN input parsing, output encapsulation, and flood replication. FRR integration extends zebra dplane handlers to manage MAC FDB and VXLAN VTEP operations. Bridge statistics collection and CLI commands for VXLAN interface and flood VTEP management are added, along with test scripts validating VXLAN and EVPN functionality. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@frr/if_grout.c`:
- Around line 171-178: The code incorrectly maps the VXLAN underlay VRF by
calling ifindex_grout_to_frr on gr_vxlan->encap_vrf_id (which is a VRF id)
resulting in IFINDEX_INTERNAL; replace the mapping with the proper VRF converter
vrf_grout_to_frr and assign its result to vi.ifindex_link so the VXLAN underlay
VRF (gr_vxlan->encap_vrf_id) is correctly conveyed to
dplane_ctx_set_ifp_vxlan_info via vi.ifindex_link.
In `@frr/rt_grout.c`:
- Around line 902-910: The debug log prints the same ifindex for both bridge and
iface; update the gr_log_debug call so bridge uses
dplane_ctx_mac_get_br_ifindex(ctx) and iface uses dplane_ctx_get_ifindex(ctx)
(i.e., replace the first/bridge argument to call dplane_ctx_mac_get_br_ifindex
instead of dplane_ctx_get_ifindex) so the log matches the request behavior; keep
the rest of the parameters (dplane_ctx_mac_get_addr, dplane_ctx_mac_get_vlan,
dplane_ctx_mac_get_vtep_ip) unchanged.
In `@modules/infra/datapath/gr_rxtx.h`:
- Around line 44-47: Packet metadata field iface_mbuf_data.vtep is left
uninitialized in some packet creation paths and can be used by vxlan_output for
routing/encapsulation; fix by explicitly initializing iface_mbuf_data.vtep (or
zeroing the entire iface_mbuf_data struct) whenever an mbuf is allocated or when
control-plane packets are constructed (e.g., the code paths that call the mbuf
allocation / packet creation functions referenced around ctlplane creation), and
ensure any helper that fills iface_mbuf_data (or the mbuf initializer) sets vtep
= IPADDR_ANY/0 before handing the packet to vxlan_output.
In `@modules/l2/cli/vxlan.c`:
- Around line 46-97: parse_vxlan_args currently continues parsing VXLAN fields
even if parse_iface_args failed (returned 0) and later args can overwrite errno;
after calling parse_iface_args (in parse_vxlan_args, where set_attrs is
assigned), immediately check if set_attrs == 0 and errno != ENOENT and if so
return 0 to short-circuit and preserve the original error, preventing further
VXLAN parsing and errno clobbering.
In `@modules/l2/control/vxlan.c`:
- Around line 77-81: The code currently deletes the old hash entry
(rte_hash_del_key with cur_key) before adding the new one, which can orphan the
VXLAN interface if rte_hash_add_key_data(vxlan_hash, &next_key, iface) fails;
change the sequence in the code that updates keys so you call
rte_hash_add_key_data(vxlan_hash, &next_key, iface) first and check its return
value, and only if it succeeds then call rte_hash_del_key(vxlan_hash, &cur_key);
ensure you reference the same iface pointer and preserve error handling (return
errno_log(-ret, "rte_hash_add_key_data") on add failure) and keep
vxlan_get_iface behavior intact.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (35)
docs/meson.buildfrr/if_grout.cfrr/rt_grout.cfrr/rt_grout.hfrr/zebra_dplane_grout.cfrr/zebra_dplane_grout.hmodules/infra/api/gr_infra.hmodules/infra/control/ctlplane.cmodules/infra/control/iface.cmodules/infra/datapath/gr_rxtx.hmodules/l2/api/gr_l2.hmodules/l2/cli/fdb.cmodules/l2/cli/flood.cmodules/l2/cli/l2_stats.cmodules/l2/cli/meson.buildmodules/l2/cli/vxlan.cmodules/l2/control/bridge.cmodules/l2/control/fdb.cmodules/l2/control/flood.cmodules/l2/control/gr_l2_control.hmodules/l2/control/l2_stats.cmodules/l2/control/meson.buildmodules/l2/control/vxlan.cmodules/l2/datapath/bridge_flood.cmodules/l2/datapath/bridge_input.cmodules/l2/datapath/l2_datapath.hmodules/l2/datapath/meson.buildmodules/l2/datapath/vxlan_flood.cmodules/l2/datapath/vxlan_input.cmodules/l2/datapath/vxlan_output.cmodules/l4/gr_l4.hmodules/l4/l4_input_local.csmoke/bridge_test.shsmoke/evpn_vxlan_frr_test.shsmoke/vxlan_test.sh
| if (gr_vxlan) { | ||
| struct zebra_l2info_vxlan vi = {0}; | ||
| vi.vni_info.iftype = ZEBRA_VXLAN_IF_VNI; | ||
| vi.vni_info.vni.vni = gr_vxlan->vni; | ||
| vi.ifindex_link = ifindex_grout_to_frr(gr_vxlan->encap_vrf_id); | ||
| vi.vtep_ip.s_addr = gr_vxlan->local; | ||
| dplane_ctx_set_ifp_vxlan_info(ctx, &vi); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find files related to VXLAN and VRF mappings
fd -t f "\.h$" -x grep -l "gr_iface_info_vxlan\|encap_vrf_id" {}Repository: DPDK/grout
Length of output: 115
🏁 Script executed:
#!/bin/bash
# Search for encap_vrf_id definition and usage
rg -n "encap_vrf_id" --type=c -B2 -A2Repository: DPDK/grout
Length of output: 6106
🏁 Script executed:
#!/bin/bash
# Find the ifindex_grout_to_frr function definition
rg -n "ifindex_grout_to_frr" --type=c -B2 -A5Repository: DPDK/grout
Length of output: 4639
🏁 Script executed:
#!/bin/bash
# Search for VRF to FRR index mapping functions
rg -n "vrf_grout_to_frr|vrf.*to.*frr" --type=cRepository: DPDK/grout
Length of output: 937
Line 175 incorrectly maps encap_vrf_id via ifindex_grout_to_frr().
encap_vrf_id is a VRF identifier (documented as "L3 domain for underlay routing"), not an interface ID. The code passes this VRF ID through an interface-index mapper, which returns IFINDEX_INTERNAL for unmapped or default VRFs, producing an invalid ifindex_link. The proper function for VRF conversion exists and is used elsewhere in the codebase (vrf_grout_to_frr()). This misconfiguration prevents VXLAN from correctly communicating its underlay VRF to FRR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frr/if_grout.c` around lines 171 - 178, The code incorrectly maps the VXLAN
underlay VRF by calling ifindex_grout_to_frr on gr_vxlan->encap_vrf_id (which is
a VRF id) resulting in IFINDEX_INTERNAL; replace the mapping with the proper VRF
converter vrf_grout_to_frr and assign its result to vi.ifindex_link so the VXLAN
underlay VRF (gr_vxlan->encap_vrf_id) is correctly conveyed to
dplane_ctx_set_ifp_vxlan_info via vi.ifindex_link.
| gr_log_debug( | ||
| "%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4", | ||
| add ? "add" : "del", | ||
| ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), | ||
| ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), | ||
| dplane_ctx_mac_get_addr(ctx), | ||
| dplane_ctx_mac_get_vlan(ctx), | ||
| dplane_ctx_mac_get_vtep_ip(ctx) | ||
| ); |
There was a problem hiding this comment.
Debug log uses dplane_ctx_get_ifindex for both bridge= and iface=.
Lines 905–906 both call ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), so the log prints the same value for bridge and iface. The actual request (lines 922–923) correctly distinguishes them using dplane_ctx_mac_get_br_ifindex(ctx) for the bridge. The log should match.
Proposed fix
gr_log_debug(
"%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4",
add ? "add" : "del",
- ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
+ ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)),
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
dplane_ctx_mac_get_addr(ctx),
dplane_ctx_mac_get_vlan(ctx),
dplane_ctx_mac_get_vtep_ip(ctx)
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| gr_log_debug( | |
| "%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4", | |
| add ? "add" : "del", | |
| ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), | |
| ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), | |
| dplane_ctx_mac_get_addr(ctx), | |
| dplane_ctx_mac_get_vlan(ctx), | |
| dplane_ctx_mac_get_vtep_ip(ctx) | |
| ); | |
| gr_log_debug( | |
| "%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4", | |
| add ? "add" : "del", | |
| ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)), | |
| ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), | |
| dplane_ctx_mac_get_addr(ctx), | |
| dplane_ctx_mac_get_vlan(ctx), | |
| dplane_ctx_mac_get_vtep_ip(ctx) | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frr/rt_grout.c` around lines 902 - 910, The debug log prints the same ifindex
for both bridge and iface; update the gr_log_debug call so bridge uses
dplane_ctx_mac_get_br_ifindex(ctx) and iface uses dplane_ctx_get_ifindex(ctx)
(i.e., replace the first/bridge argument to call dplane_ctx_mac_get_br_ifindex
instead of dplane_ctx_get_ifindex) so the log matches the request behavior; keep
the rest of the parameters (dplane_ctx_mac_get_addr, dplane_ctx_mac_get_vlan,
dplane_ctx_mac_get_vtep_ip) unchanged.
| GR_MBUF_PRIV_DATA_TYPE(iface_mbuf_data, { | ||
| uint16_t vlan_id; | ||
| ip4_addr_t vtep; | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find all uses of iface_mbuf_data structure
rg -n --type=c --type=h 'iface_mbuf_data' -A 3 -B 1Repository: DPDK/grout
Length of output: 8225
🏁 Script executed:
# Find all accesses to the vtep field
rg -n --type=c --type=h '\.vtep\b' -A 2 -B 2Repository: DPDK/grout
Length of output: 2041
🏁 Script executed:
# Search for where iface_mbuf_data is cast or accessed from mbuf
rg -n --type=c --type=h 'GR_MBUF_PRIV_DATA\(' -A 2 -B 1Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Find VTEP-related functionality
rg -n --type=c --type=h 'vtep' -C 2 | head -100Repository: DPDK/grout
Length of output: 5415
🏁 Script executed:
# Check how iface_mbuf_data is allocated - look for mbuf allocation
rg -n --type=c --type=h 'rte_pktmbuf_alloc|rte_pktmbuf_clone' -A 3 -B 1Repository: DPDK/grout
Length of output: 2630
🏁 Script executed:
# Check if there's any memset or zeroing of mbuf private data
rg -n --type=c --type=h 'memset.*priv|priv.*memset' -A 2 -B 2Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Look for all code paths that create/use iface_mbuf_data, especially in bridge_input
rg -n --type=c 'bridge_input|bridge_fdb' -A 10 | head -80Repository: DPDK/grout
Length of output: 4916
🏁 Script executed:
# Check the allocation of mbufs in port_rx to see if they're zeroed
sed -n '85,120p' modules/infra/datapath/port_rx.cRepository: DPDK/grout
Length of output: 925
🏁 Script executed:
# Verify all VXLAN output usage patterns
sed -n '40,85p' modules/l2/datapath/vxlan_output.cRepository: DPDK/grout
Length of output: 1150
🏁 Script executed:
# Check gr_mbuf.h for macro details
cat -n modules/infra/datapath/gr_mbuf.h | head -80Repository: DPDK/grout
Length of output: 3885
🏁 Script executed:
# Look for any explicit zeroing in iface_mbuf_data initialization
rg -n --type=c --type=h 'iface_mbuf_data.*=' -B 5 -A 2 | head -100Repository: DPDK/grout
Length of output: 1588
🏁 Script executed:
# Check if there are any other paths that access vtep without setting it
rg -n --type=c '\.vtep' -B 5 | grep -A 5 -B 5 'vxlan_output\|bridge_input\|port_rx'Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check control plane path more carefully - does ctlplane need vtep?
sed -n '200,220p' modules/infra/control/ctlplane.cRepository: DPDK/grout
Length of output: 580
🏁 Script executed:
# Look for all packet sources that could reach vxlan_output
rg -n --type=c 'vxlan_output' -B 10 | head -60Repository: DPDK/grout
Length of output: 3078
🏁 Script executed:
# Find what interfaces can be VXLAN type
rg -n --type=c 'GR_IFACE_TYPE_VXLAN' -B 3 -A 3Repository: DPDK/grout
Length of output: 6922
🏁 Script executed:
# Check bridge flood destination paths
sed -n '1,120p' modules/l2/datapath/bridge_flood.cRepository: DPDK/grout
Length of output: 2701
🏁 Script executed:
# Check iface_output to see if it can route to vxlan_output
sed -n '50,90p' modules/infra/datapath/iface_output.cRepository: DPDK/grout
Length of output: 921
🏁 Script executed:
# Trace complete flow: does a bridged packet go through vxlan_output?
rg -n --type=c 'iface_output.*=.*vxlan|vxlan.*output' | head -20Repository: DPDK/grout
Length of output: 1015
🏁 Script executed:
# Verify bridge_input reads vtep from port_rx packets
sed -n '40,75p' modules/l2/datapath/bridge_input.cRepository: DPDK/grout
Length of output: 1087
🏁 Script executed:
# Check gr_mbuf_copy to confirm what it copies
sed -n '76,83p' modules/infra/datapath/gr_mbuf.hRepository: DPDK/grout
Length of output: 423
🏁 Script executed:
# Verify port_rx doesn't initialize vtep - all rx functions
rg -n --type=c 'static uint16_t.*_process.*rx|^(uint16_t|static).*port_rx' -A 35 | grep -A 35 'iface_mbuf_data\|vtep'Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check if there are any other ways to reach vxlan_output besides vxlan_flood
rg -n --type=c 'vxlan_output|iface_output_type_register' -B 5 | grep -A 5 'vxlan_output'Repository: DPDK/grout
Length of output: 2677
🏁 Script executed:
# Verify that vxlan_flood is the ONLY input to vxlan_output
rg -n --type=c 'rte_node.*vxlan_output\|vxlan_output.*INPUT\|OUTPUT.*vxlan_output' -C 3Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check all callers of gr_mbuf_copy to see what they copy
rg -n --type=c 'gr_mbuf_copy' -B 2 -A 2Repository: DPDK/grout
Length of output: 1878
🏁 Script executed:
# Critical: check if cloned packets in bridge_flood have vtep initialized
sed -n '30,50p' modules/l2/datapath/bridge_flood.cRepository: DPDK/grout
Length of output: 475
🏁 Script executed:
# Check the size of struct mbuf_data vs iface_mbuf_data
rg -n --type=c 'struct mbuf_data\|struct iface_mbuf_data' modules/infra/datapath/gr_rxtx.hRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check if packets can reach vxlan_output via IP output (not just via vxlan_flood)
rg -n --type=c 'ip_output|iface_output' modules/infra/datapath/iface_output.c -A 5 | head -50Repository: DPDK/grout
Length of output: 1638
🏁 Script executed:
# Verify the complete flow: can packets reach vxlan_output with uninitialized vtep?
# Check all places that route to vxlan_output
rg -n --type=c '"vxlan_output"' -B 10 -A 2Repository: DPDK/grout
Length of output: 1020
🏁 Script executed:
# Check if there's any initialization of vtep during packet creation or allocation
rg -n --type=c 'vtep.*=' -B 3 | grep -B 3 -A 1 'port_rx\|allocation\|alloc\|memset'Repository: DPDK/grout
Length of output: 312
🏁 Script executed:
# Critical: what happens when a regular IP packet (vlan_id=0, vtep=garbage) is output to a VXLAN interface?
sed -n '36,90p' modules/l2/datapath/vxlan_output.cRepository: DPDK/grout
Length of output: 1320
Initialize iface_mbuf_data.vtep when creating packets.
Packets without explicit vtep initialization (e.g., control plane packets in ctlplane.c:209-210) can reach vxlan_output and use uninitialized vtep as a routing lookup key and encapsulation destination address, resulting in packet misdirection. Ensure all packet creation paths initialize vtep explicitly, or zero the iface_mbuf_data structure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/datapath/gr_rxtx.h` around lines 44 - 47, Packet metadata field
iface_mbuf_data.vtep is left uninitialized in some packet creation paths and can
be used by vxlan_output for routing/encapsulation; fix by explicitly
initializing iface_mbuf_data.vtep (or zeroing the entire iface_mbuf_data struct)
whenever an mbuf is allocated or when control-plane packets are constructed
(e.g., the code paths that call the mbuf allocation / packet creation functions
referenced around ctlplane creation), and ensure any helper that fills
iface_mbuf_data (or the mbuf initializer) sets vtep = IPADDR_ANY/0 before
handing the packet to vxlan_output.
| static uint64_t parse_vxlan_args( | ||
| struct gr_api_client *c, | ||
| const struct ec_pnode *p, | ||
| struct gr_iface *iface, | ||
| bool update | ||
| ) { | ||
| struct gr_iface_info_vxlan *vxlan; | ||
| uint64_t set_attrs; | ||
|
|
||
| set_attrs = parse_iface_args(c, p, iface, sizeof(*vxlan), update); | ||
|
|
||
| vxlan = (struct gr_iface_info_vxlan *)iface->info; | ||
|
|
||
| if (arg_u32(p, "VNI", &vxlan->vni) < 0) { | ||
| if (errno != ENOENT) | ||
| return 0; | ||
| } else { | ||
| set_attrs |= GR_VXLAN_SET_VNI; | ||
| } | ||
|
|
||
| if (arg_ip4(p, "LOCAL", &vxlan->local) < 0) { | ||
| if (errno != ENOENT) | ||
| return 0; | ||
| } else { | ||
| set_attrs |= GR_VXLAN_SET_LOCAL; | ||
| } | ||
|
|
||
| if (arg_str(p, "ENCAP_VRF") != NULL) { | ||
| if (arg_vrf(c, p, "ENCAP_VRF", &vxlan->encap_vrf_id) < 0) | ||
| return 0; | ||
| else | ||
| set_attrs |= GR_VXLAN_SET_ENCAP_VRF; | ||
| } | ||
|
|
||
| if (arg_u16(p, "DST_PORT", &vxlan->dst_port) < 0) { | ||
| if (errno != ENOENT) | ||
| return 0; | ||
| } else { | ||
| set_attrs |= GR_VXLAN_SET_DST_PORT; | ||
| } | ||
|
|
||
| if (arg_eth_addr(p, "MAC", &vxlan->mac) < 0) { | ||
| if (errno != ENOENT) | ||
| return 0; | ||
| } else { | ||
| set_attrs |= GR_VXLAN_SET_MAC; | ||
| } | ||
|
|
||
| if (set_attrs == 0) | ||
| errno = EINVAL; | ||
| return set_attrs; | ||
| } |
There was a problem hiding this comment.
parse_iface_args error not short-circuited before VXLAN arg parsing
If parse_iface_args returns 0 due to a real error (e.g., ENAMETOOLONG, ENODEV on update), parse_vxlan_args continues to parse VXLAN-specific attributes. If at least one VXLAN arg is present, the function returns a non-zero set_attrs — silently discarding the base-parse failure. The stale errno may also get overwritten at line 94–95.
The resulting request is sent to the server with a zeroed/invalid iface.name (add) or unresolved iface.id (set), forcing an unnecessary round-trip and potentially a confusing error response rather than an early client-side error.
🛡️ Suggested guard
set_attrs = parse_iface_args(c, p, iface, sizeof(*vxlan), update);
+ if (set_attrs == 0 && errno != ENOENT)
+ return 0;
vxlan = (struct gr_iface_info_vxlan *)iface->info;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/l2/cli/vxlan.c` around lines 46 - 97, parse_vxlan_args currently
continues parsing VXLAN fields even if parse_iface_args failed (returned 0) and
later args can overwrite errno; after calling parse_iface_args (in
parse_vxlan_args, where set_attrs is assigned), immediately check if set_attrs
== 0 and errno != ENOENT and if so return 0 to short-circuit and preserve the
original error, preventing further VXLAN parsing and errno clobbering.
| rte_hash_del_key(vxlan_hash, &cur_key); | ||
|
|
||
| ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface); | ||
| if (ret < 0) | ||
| return errno_log(-ret, "rte_hash_add_key_data"); |
There was a problem hiding this comment.
Bug: old hash entry is deleted before the new one is confirmed, risking orphaned state on failure.
If rte_hash_add_key_data at line 79 fails (e.g., table full), the old key was already removed at line 77. The VXLAN interface becomes unreachable via vxlan_get_iface with no recovery path.
Add the new key first, then delete the old one:
Proposed fix
- rte_hash_del_key(vxlan_hash, &cur_key);
-
ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface);
if (ret < 0)
return errno_log(-ret, "rte_hash_add_key_data");
+ rte_hash_del_key(vxlan_hash, &cur_key);
+
cur->vni = next->vni;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/l2/control/vxlan.c` around lines 77 - 81, The code currently deletes
the old hash entry (rte_hash_del_key with cur_key) before adding the new one,
which can orphan the VXLAN interface if rte_hash_add_key_data(vxlan_hash,
&next_key, iface) fails; change the sequence in the code that updates keys so
you call rte_hash_add_key_data(vxlan_hash, &next_key, iface) first and check its
return value, and only if it succeeds then call rte_hash_del_key(vxlan_hash,
&cur_key); ensure you reference the same iface pointer and preserve error
handling (return errno_log(-ret, "rte_hash_add_key_data") on add failure) and
keep vxlan_get_iface behavior intact.
rjarry
left a comment
There was a problem hiding this comment.
I wonder if using rte_graph xstats would be a better direction here. What do you think?
| printf(" no_fdb: %lu\n", stats->no_fdb_drop); | ||
| printf(" hairpin: %lu\n", stats->hairpin_drop); | ||
| printf(" iface_down: %lu\n", stats->iface_down_drop); |
There was a problem hiding this comment.
these seem duplicate with software stats (we have nodes to represent such drops)
| extern struct bridge_stats l2_bridge_stats[L2_MAX_BRIDGES][RTE_MAX_LCORE]; | ||
| extern struct fdb_stats l2_fdb_stats[L2_MAX_BRIDGES][RTE_MAX_LCORE]; | ||
|
|
||
| static inline struct bridge_stats *bridge_get_stats(uint16_t bridge_id, uint16_t lcore_id) { | ||
| if (bridge_id >= L2_MAX_BRIDGES) | ||
| return NULL; | ||
| return &l2_bridge_stats[bridge_id][lcore_id]; | ||
| } | ||
|
|
||
| static inline struct fdb_stats *fdb_get_stats(uint16_t bridge_id, uint16_t lcore_id) { | ||
| if (bridge_id >= L2_MAX_BRIDGES) | ||
| return NULL; | ||
| return &l2_fdb_stats[bridge_id][lcore_id]; | ||
| } | ||
|
|
There was a problem hiding this comment.
this patch probably lacks some datapath code to actually increment the statistics?
Good suggestion, I looked into rte_node_xstats. The main trade-off is granularity: xstats are per-node, not per-bridge. Since bridge_input is a single shared node processing all bridges, the counters would be global aggregates. With multiple bridges, you couldn't tell which bridge is seeing hairpin drops or FDB misses. If per-bridge breakdown isn't needed (and the generic iface_stats already cover per-interface tx/rx), then xstats would be simpler and integrate with the existing graph stats infrastructure without adding parallel arrays. If per-bridge granularity matters, the custom [bridge_id][lcore_id] indexing is needed because xstats don't have a way to partition by a runtime key. What's your preference? I can rework to xstats if global counters are sufficient. |
Good point. I don't really know if which is more important. I am tempted to go towards xstats but it will require some base framework changes to allow reporting the stats without any node "name". Perhaps you could store these as bridge/fdb specific counters for now and expose them under Thanks! |
Summary
Expose per-bridge forwarding, drop and learning counters aggregated
across all datapath cores. Each counter category is maintained in
per-core arrays to avoid cache contention in the datapath, then
summed on demand when the control plane queries them.
Depends on #504.
The bridge_stats structure tracks unicast, broadcast, multicast and
flood forwarding, various drop reasons (no FDB entry, hairpin,
interface down), and learning outcomes (success, update, failure,
limit exceeded). The fdb_stats structure tracks lookup hits, misses
and aged entries.
Statistics are cleared when a bridge is destroyed. A reset API
allows clearing counters without destroying the bridge.
CLI commands are exposed under
bridge stats show/reset.Test plan
meson test)smoke/run.sh)grcli bridge stats showdisplays counters after trafficgrcli bridge stats resetclears countersSummary by CodeRabbit
New Features
Tests