Skip to content

l2: add bridge and FDB statistics#516

Open
fabiendupont wants to merge 13 commits intoDPDK:mainfrom
fabiendupont:l2-stats
Open

l2: add bridge and FDB statistics#516
fabiendupont wants to merge 13 commits intoDPDK:mainfrom
fabiendupont:l2-stats

Conversation

@fabiendupont
Copy link

@fabiendupont fabiendupont commented Feb 24, 2026

Summary

Expose per-bridge forwarding, drop and learning counters aggregated
across all datapath cores. Each counter category is maintained in
per-core arrays to avoid cache contention in the datapath, then
summed on demand when the control plane queries them.

Depends on #504.

The bridge_stats structure tracks unicast, broadcast, multicast and
flood forwarding, various drop reasons (no FDB entry, hairpin,
interface down), and learning outcomes (success, update, failure,
limit exceeded). The fdb_stats structure tracks lookup hits, misses
and aged entries.

Statistics are cleared when a bridge is destroyed. A reset API
allows clearing counters without destroying the bridge.

CLI commands are exposed under bridge stats show/reset.

Test plan

  • Unit tests pass (meson test)
  • Smoke tests pass (smoke/run.sh)
  • grcli bridge stats show displays counters after traffic
  • grcli bridge stats reset clears counters

Summary by CodeRabbit

  • New Features

    • Added VXLAN interface support with configurable VNI, encapsulation VRF, and destination port
    • Added flood VTEP management for VXLAN overlay networks
    • Added L2 bridge statistics with tracking for forwarding, learning, drops, and FDB entries
    • Added MAC FDB learning with remote VTEP tracking for VXLAN entries
    • Added CLI commands for VXLAN configuration, bridge statistics display, and VTEP flood management
  • Tests

    • Added VXLAN functionality tests and EVPN integration tests

rjarry and others added 13 commits February 20, 2026 21:40
Introduce the VXLAN interface type for the L2 module. A VXLAN
interface carries a VNI (VXLAN Network Identifier), a local VTEP
address used as the outer IP source, an encapsulation VRF for
underlay routing, and a configurable UDP destination port (default
4789).

VXLAN interfaces are keyed by (VNI, encap_vrf_id) in a lockfree
RCU-protected hash table so that the datapath can resolve incoming
tunneled packets to the correct interface without locks.

VXLAN interfaces are intended to be attached to a bridge domain.
All L2 traffic entering the bridge is forwarded transparently over
the VXLAN tunnel. The local VTEP address must already be configured
in the encapsulation VRF.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
VXLAN uses UDP port 4789 by default but allows configuring a custom
destination port per interface. Allow the control plane to register
additional UDP ports at runtime as aliases for an already registered
port, reusing the same datapath edge.

Use reference counting so that multiple interfaces sharing the same
non-default port do not interfere with each other during teardown.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Wire up the VXLAN interface's configurable destination port to the
L4 input node. When a non-default port is configured, register it
as an alias for the standard VXLAN port (4789) so that the datapath
delivers matching UDP packets to the vxlan_input node.

Unregister the alias when the port changes or the interface is
destroyed.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce a transport-agnostic flood list framework for BUM traffic
(Broadcast, Unknown unicast, Multicast). In EVPN, each PE maintains
a flooding list built from IMET routes (RFC 8365, RFC 9572). The
entries in this list differ depending on the overlay encapsulation:
VXLAN uses a remote VTEP IPv4 address and a VNI, while SRv6 would
use a 128-bit SID.

The API defines a gr_flood_entry structure with a type discriminant
and a union, allowing future encapsulation types (e.g. SRv6 SIDs)
to be added without changing the API request types. A dispatch
layer in control/flood.c routes add/del/list operations to
type-specific callbacks registered at init time.

Implement the VXLAN VTEP flood type (GR_FLOOD_T_VTEP). Each VXLAN
interface maintains a per-VNI array of remote VTEP addresses used
by the vxlan_flood datapath node for ingress replication. The array
is replaced atomically with an RCU synchronization barrier so that
the datapath never sees a partially updated list.

CLI commands are exposed under "flood vtep add/del/show". Add new
generated grcli-flood(1) man page.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
In a VXLAN overlay, the bridge needs to know which remote VTEP to use
when sending unicast frames to a learned MAC address. Add a VTEP IPv4
address field to FDB entries so that known unicast traffic can be sent
directly to the correct tunnel endpoint instead of being flooded to all
VTEPs.

When bridge_input learns a MAC address from a VXLAN member interface, it
records the source VTEP from the decapsulated packet's outer IP header.
When forwarding to a known destination, the stored VTEP address is
passed to the output path via the mbuf private data so that vxlan_output
can build the correct outer header.

Only set the VTEP field when the source interface is actually a VXLAN
type to avoid storing uninitialized data from other packet paths
(control plane, local bridge traffic).

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add three datapath nodes for VXLAN packet processing.

vxlan_input decapsulates incoming UDP/4789 packets. It strips the
outer UDP and VXLAN headers, resolves the inner VNI to a VXLAN
interface via the RCU-protected hash table, records the source VTEP
from the outer IP header into the mbuf private data, and forwards
the inner Ethernet frame to iface_input for bridge processing.

vxlan_output encapsulates outgoing frames for a known destination
VTEP. It prepends a pre-built IP/UDP/VXLAN header template
initialized by the control plane, fills in the per-packet fields
(destination VTEP, UDP length, IP length, checksum), and hashes the
inner flow to select an ephemeral source port for underlay ECMP
(RFC 7348 Section 5). The FIB lookup for the outer IP uses the
encapsulation VRF, not the bridge domain.

vxlan_flood handles BUM traffic by replicating the frame to every
VTEP in the flood list via ingress replication. The original mbuf
is sent to the first VTEP and clones are created for the rest.

The bridge_flood node is updated to steer VXLAN member traffic
through vxlan_flood instead of direct iface_output.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a VXLAN overlay between grout and a Linux netns peer. Grout
runs a bridge with a VXLAN member (VNI 100) and the Linux side
mirrors the topology with a kernel VXLAN device enslaved to a Linux
bridge. Both sides have flood lists configured with each other's
VTEP address for BUM traffic replication.

The test verifies L3 connectivity over the tunnel by having the
Linux side ping the bridge address. This exercises the full path:
ARP resolution over VXLAN, FDB learning from decapsulated traffic,
and ICMP echo reply via the VXLAN output encapsulation.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report bridge interfaces to FRR as ZEBRA_IF_BRIDGE with their MAC
address. Tag members with ZEBRA_IF_SLAVE_BRIDGE and propagate the
bridge ifindex so that FRR can associate them with the correct master.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report VXLAN interfaces to FRR's zebra as ZEBRA_IF_VXLAN with the
associated L2 VNI information. This allows FRR's EVPN control
plane to discover which VNIs are locally configured and advertise
them via BGP IMET routes to remote PEs.

The VXLAN L2 info includes the VNI, the local VTEP address, and
the underlay interface index so that zebra can correlate the tunnel
with the correct underlay routing context.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Synchronize bridge FDB entries bidirectionally between grout and FRR.
This is required for EVPN to advertise locally learned MAC addresses
via BGP type-2 routes and to install remotely learned MACs into the
bridge forwarding table.

Zebra's dplane API is asymmetric for MAC/FDB entries. In the downward
direction (zebra to dplane provider), zebra uses DPLANE_OP_MAC_INSTALL
and DPLANE_OP_MAC_DELETE to push MACs into the dataplane. In the
upward direction (dplane provider notifying zebra of learned MACs),
DPLANE_OP_NEIGH_INSTALL and DPLANE_OP_NEIGH_DELETE must be used
instead. These go through zebra_neigh_macfdb_update() which calls
zebra_vxlan_local_mac_add_update() and ultimately triggers BGP EVPN
type-2 route advertisement. By contrast, the DPLANE_OP_MAC_* result
handler (zebra_vxlan_handle_result) is a no-op. Despite the NEIGH op
name, the context payload uses the macinfo union member and is
populated with dplane_ctx_mac_set_*() accessors, exactly like zebra's
own netlink provider does in netlink_macfdb_change().

Unlike routes and nexthops which use higher-level zebra APIs that
resolve the namespace from the VRF ID, the FDB notification path
looks up interfaces via if_lookup_by_index_per_ns(ns_id, ifindex).
GROUT_NS must therefore be set on the dplane context for the
interface lookup to succeed.

Function names follow zebra's rt_netlink.c naming conventions:
grout_macfdb_change() for the upward notification path (like
netlink_macfdb_change) and grout_macfdb_update_ctx() for the
downward install path (like netlink_macfdb_update_ctx).

Self-event suppression is enabled on the FDB event subscriptions
to prevent feedback loops when FRR installs a MAC that was originally
learned by grout.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Handle DPLANE_OP_VTEP_ADD and DPLANE_OP_VTEP_DELETE operations from
FRR's EVPN control plane. When BGP learns a remote VTEP via an IMET
route (EVPN type-3), zebra pushes the VTEP to the dataplane provider.

The grout_vxlan_flood_update_ctx() function (named after zebra's
netlink_vxlan_flood_update_ctx() in rt_netlink.c) translates these
operations into GR_FLOOD_ADD/DEL requests with GR_FLOOD_T_VTEP type.
This is a downward-only path: zebra pushes flood list entries to the
dplane provider. There is no upward notification for VTEP flood list
changes since grout does not learn VTEPs on its own, they are always
provided by FRR's BGP EVPN control plane.

This allows BGP EVPN to dynamically manage the per-VNI flood lists
used for BUM traffic ingress replication, replacing the need for
static flood list configuration via the CLI.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a full EVPN/VXLAN topology between FRR+grout and a
standalone FRR+Linux peer. Each side runs a bridge with a VXLAN
member (VNI 100) and a host namespace. Both peers run iBGP with
the l2vpn evpn address-family and advertise-all-vni.

The test verifies that EVPN type-3 (IMET) routes are exchanged so
that both sides install each other's VTEP in their flood lists.
It then verifies end-to-end L2 connectivity by pinging between the
two host namespaces through the VXLAN overlay, which exercises
type-2 (MAC/IP) route advertisement and FDB synchronization.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Expose per-bridge forwarding, drop and learning counters aggregated
across all datapath cores. Each counter category is maintained in
per-core arrays to avoid cache contention in the datapath, then
summed on demand when the control plane queries them.

The bridge_stats structure tracks unicast, broadcast, multicast and
flood forwarding, various drop reasons (no FDB entry, hairpin,
interface down), and learning outcomes (success, update, failure,
limit exceeded). The fdb_stats structure tracks lookup hits, misses
and aged entries.

Statistics are cleared when a bridge is destroyed. A reset API
allows clearing counters without destroying the bridge.

CLI commands are exposed under "bridge stats show/reset". Add the
"bridge" CLI command group to generated man pages.

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive VXLAN (Virtual Extensible LAN) support across the control plane and datapath. Changes include a new GR_IFACE_TYPE_VXLAN interface type in the infrastructure API, VXLAN interface configuration and management in the L2 control module, VXLAN flood VTEP registration and management, and FDB extensions to track remote VTEPs for learned MAC addresses. The datapath receives three new nodes for VXLAN input parsing, output encapsulation, and flood replication. FRR integration extends zebra dplane handlers to manage MAC FDB and VXLAN VTEP operations. Bridge statistics collection and CLI commands for VXLAN interface and flood VTEP management are added, along with test scripts validating VXLAN and EVPN functionality.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frr/if_grout.c`:
- Around line 171-178: The code incorrectly maps the VXLAN underlay VRF by
calling ifindex_grout_to_frr on gr_vxlan->encap_vrf_id (which is a VRF id)
resulting in IFINDEX_INTERNAL; replace the mapping with the proper VRF converter
vrf_grout_to_frr and assign its result to vi.ifindex_link so the VXLAN underlay
VRF (gr_vxlan->encap_vrf_id) is correctly conveyed to
dplane_ctx_set_ifp_vxlan_info via vi.ifindex_link.

In `@frr/rt_grout.c`:
- Around line 902-910: The debug log prints the same ifindex for both bridge and
iface; update the gr_log_debug call so bridge uses
dplane_ctx_mac_get_br_ifindex(ctx) and iface uses dplane_ctx_get_ifindex(ctx)
(i.e., replace the first/bridge argument to call dplane_ctx_mac_get_br_ifindex
instead of dplane_ctx_get_ifindex) so the log matches the request behavior; keep
the rest of the parameters (dplane_ctx_mac_get_addr, dplane_ctx_mac_get_vlan,
dplane_ctx_mac_get_vtep_ip) unchanged.

In `@modules/infra/datapath/gr_rxtx.h`:
- Around line 44-47: Packet metadata field iface_mbuf_data.vtep is left
uninitialized in some packet creation paths and can be used by vxlan_output for
routing/encapsulation; fix by explicitly initializing iface_mbuf_data.vtep (or
zeroing the entire iface_mbuf_data struct) whenever an mbuf is allocated or when
control-plane packets are constructed (e.g., the code paths that call the mbuf
allocation / packet creation functions referenced around ctlplane creation), and
ensure any helper that fills iface_mbuf_data (or the mbuf initializer) sets vtep
= IPADDR_ANY/0 before handing the packet to vxlan_output.

In `@modules/l2/cli/vxlan.c`:
- Around line 46-97: parse_vxlan_args currently continues parsing VXLAN fields
even if parse_iface_args failed (returned 0) and later args can overwrite errno;
after calling parse_iface_args (in parse_vxlan_args, where set_attrs is
assigned), immediately check if set_attrs == 0 and errno != ENOENT and if so
return 0 to short-circuit and preserve the original error, preventing further
VXLAN parsing and errno clobbering.

In `@modules/l2/control/vxlan.c`:
- Around line 77-81: The code currently deletes the old hash entry
(rte_hash_del_key with cur_key) before adding the new one, which can orphan the
VXLAN interface if rte_hash_add_key_data(vxlan_hash, &next_key, iface) fails;
change the sequence in the code that updates keys so you call
rte_hash_add_key_data(vxlan_hash, &next_key, iface) first and check its return
value, and only if it succeeds then call rte_hash_del_key(vxlan_hash, &cur_key);
ensure you reference the same iface pointer and preserve error handling (return
errno_log(-ret, "rte_hash_add_key_data") on add failure) and keep
vxlan_get_iface behavior intact.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16060d5 and 2cbc4a4.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (35)
  • docs/meson.build
  • frr/if_grout.c
  • frr/rt_grout.c
  • frr/rt_grout.h
  • frr/zebra_dplane_grout.c
  • frr/zebra_dplane_grout.h
  • modules/infra/api/gr_infra.h
  • modules/infra/control/ctlplane.c
  • modules/infra/control/iface.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/l2/api/gr_l2.h
  • modules/l2/cli/fdb.c
  • modules/l2/cli/flood.c
  • modules/l2/cli/l2_stats.c
  • modules/l2/cli/meson.build
  • modules/l2/cli/vxlan.c
  • modules/l2/control/bridge.c
  • modules/l2/control/fdb.c
  • modules/l2/control/flood.c
  • modules/l2/control/gr_l2_control.h
  • modules/l2/control/l2_stats.c
  • modules/l2/control/meson.build
  • modules/l2/control/vxlan.c
  • modules/l2/datapath/bridge_flood.c
  • modules/l2/datapath/bridge_input.c
  • modules/l2/datapath/l2_datapath.h
  • modules/l2/datapath/meson.build
  • modules/l2/datapath/vxlan_flood.c
  • modules/l2/datapath/vxlan_input.c
  • modules/l2/datapath/vxlan_output.c
  • modules/l4/gr_l4.h
  • modules/l4/l4_input_local.c
  • smoke/bridge_test.sh
  • smoke/evpn_vxlan_frr_test.sh
  • smoke/vxlan_test.sh

Comment on lines +171 to +178
if (gr_vxlan) {
struct zebra_l2info_vxlan vi = {0};
vi.vni_info.iftype = ZEBRA_VXLAN_IF_VNI;
vi.vni_info.vni.vni = gr_vxlan->vni;
vi.ifindex_link = ifindex_grout_to_frr(gr_vxlan->encap_vrf_id);
vi.vtep_ip.s_addr = gr_vxlan->local;
dplane_ctx_set_ifp_vxlan_info(ctx, &vi);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find files related to VXLAN and VRF mappings
fd -t f "\.h$" -x grep -l "gr_iface_info_vxlan\|encap_vrf_id" {}

Repository: DPDK/grout

Length of output: 115


🏁 Script executed:

#!/bin/bash
# Search for encap_vrf_id definition and usage
rg -n "encap_vrf_id" --type=c -B2 -A2

Repository: DPDK/grout

Length of output: 6106


🏁 Script executed:

#!/bin/bash
# Find the ifindex_grout_to_frr function definition
rg -n "ifindex_grout_to_frr" --type=c -B2 -A5

Repository: DPDK/grout

Length of output: 4639


🏁 Script executed:

#!/bin/bash
# Search for VRF to FRR index mapping functions
rg -n "vrf_grout_to_frr|vrf.*to.*frr" --type=c

Repository: DPDK/grout

Length of output: 937


Line 175 incorrectly maps encap_vrf_id via ifindex_grout_to_frr().

encap_vrf_id is a VRF identifier (documented as "L3 domain for underlay routing"), not an interface ID. The code passes this VRF ID through an interface-index mapper, which returns IFINDEX_INTERNAL for unmapped or default VRFs, producing an invalid ifindex_link. The proper function for VRF conversion exists and is used elsewhere in the codebase (vrf_grout_to_frr()). This misconfiguration prevents VXLAN from correctly communicating its underlay VRF to FRR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frr/if_grout.c` around lines 171 - 178, The code incorrectly maps the VXLAN
underlay VRF by calling ifindex_grout_to_frr on gr_vxlan->encap_vrf_id (which is
a VRF id) resulting in IFINDEX_INTERNAL; replace the mapping with the proper VRF
converter vrf_grout_to_frr and assign its result to vi.ifindex_link so the VXLAN
underlay VRF (gr_vxlan->encap_vrf_id) is correctly conveyed to
dplane_ctx_set_ifp_vxlan_info via vi.ifindex_link.

Comment on lines +902 to +910
gr_log_debug(
"%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4",
add ? "add" : "del",
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
dplane_ctx_mac_get_addr(ctx),
dplane_ctx_mac_get_vlan(ctx),
dplane_ctx_mac_get_vtep_ip(ctx)
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Debug log uses dplane_ctx_get_ifindex for both bridge= and iface=.

Lines 905–906 both call ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)), so the log prints the same value for bridge and iface. The actual request (lines 922–923) correctly distinguishes them using dplane_ctx_mac_get_br_ifindex(ctx) for the bridge. The log should match.

Proposed fix
 	gr_log_debug(
 		"%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4",
 		add ? "add" : "del",
-		ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
+		ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)),
 		ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
 		dplane_ctx_mac_get_addr(ctx),
 		dplane_ctx_mac_get_vlan(ctx),
 		dplane_ctx_mac_get_vtep_ip(ctx)
 	);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
gr_log_debug(
"%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4",
add ? "add" : "del",
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
dplane_ctx_mac_get_addr(ctx),
dplane_ctx_mac_get_vlan(ctx),
dplane_ctx_mac_get_vtep_ip(ctx)
);
gr_log_debug(
"%s bridge=%u iface=%u mac=%pEA vlan=%u vtep=%pI4",
add ? "add" : "del",
ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)),
ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)),
dplane_ctx_mac_get_addr(ctx),
dplane_ctx_mac_get_vlan(ctx),
dplane_ctx_mac_get_vtep_ip(ctx)
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frr/rt_grout.c` around lines 902 - 910, The debug log prints the same ifindex
for both bridge and iface; update the gr_log_debug call so bridge uses
dplane_ctx_mac_get_br_ifindex(ctx) and iface uses dplane_ctx_get_ifindex(ctx)
(i.e., replace the first/bridge argument to call dplane_ctx_mac_get_br_ifindex
instead of dplane_ctx_get_ifindex) so the log matches the request behavior; keep
the rest of the parameters (dplane_ctx_mac_get_addr, dplane_ctx_mac_get_vlan,
dplane_ctx_mac_get_vtep_ip) unchanged.

Comment on lines +44 to +47
GR_MBUF_PRIV_DATA_TYPE(iface_mbuf_data, {
uint16_t vlan_id;
ip4_addr_t vtep;
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find all uses of iface_mbuf_data structure
rg -n --type=c --type=h 'iface_mbuf_data' -A 3 -B 1

Repository: DPDK/grout

Length of output: 8225


🏁 Script executed:

# Find all accesses to the vtep field
rg -n --type=c --type=h '\.vtep\b' -A 2 -B 2

Repository: DPDK/grout

Length of output: 2041


🏁 Script executed:

# Search for where iface_mbuf_data is cast or accessed from mbuf
rg -n --type=c --type=h 'GR_MBUF_PRIV_DATA\(' -A 2 -B 1

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Find VTEP-related functionality
rg -n --type=c --type=h 'vtep' -C 2 | head -100

Repository: DPDK/grout

Length of output: 5415


🏁 Script executed:

# Check how iface_mbuf_data is allocated - look for mbuf allocation
rg -n --type=c --type=h 'rte_pktmbuf_alloc|rte_pktmbuf_clone' -A 3 -B 1

Repository: DPDK/grout

Length of output: 2630


🏁 Script executed:

# Check if there's any memset or zeroing of mbuf private data
rg -n --type=c --type=h 'memset.*priv|priv.*memset' -A 2 -B 2

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Look for all code paths that create/use iface_mbuf_data, especially in bridge_input
rg -n --type=c 'bridge_input|bridge_fdb' -A 10 | head -80

Repository: DPDK/grout

Length of output: 4916


🏁 Script executed:

# Check the allocation of mbufs in port_rx to see if they're zeroed
sed -n '85,120p' modules/infra/datapath/port_rx.c

Repository: DPDK/grout

Length of output: 925


🏁 Script executed:

# Verify all VXLAN output usage patterns
sed -n '40,85p' modules/l2/datapath/vxlan_output.c

Repository: DPDK/grout

Length of output: 1150


🏁 Script executed:

# Check gr_mbuf.h for macro details
cat -n modules/infra/datapath/gr_mbuf.h | head -80

Repository: DPDK/grout

Length of output: 3885


🏁 Script executed:

# Look for any explicit zeroing in iface_mbuf_data initialization
rg -n --type=c --type=h 'iface_mbuf_data.*=' -B 5 -A 2 | head -100

Repository: DPDK/grout

Length of output: 1588


🏁 Script executed:

# Check if there are any other paths that access vtep without setting it
rg -n --type=c '\.vtep' -B 5 | grep -A 5 -B 5 'vxlan_output\|bridge_input\|port_rx'

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Check control plane path more carefully - does ctlplane need vtep?
sed -n '200,220p' modules/infra/control/ctlplane.c

Repository: DPDK/grout

Length of output: 580


🏁 Script executed:

# Look for all packet sources that could reach vxlan_output
rg -n --type=c 'vxlan_output' -B 10 | head -60

Repository: DPDK/grout

Length of output: 3078


🏁 Script executed:

# Find what interfaces can be VXLAN type
rg -n --type=c 'GR_IFACE_TYPE_VXLAN' -B 3 -A 3

Repository: DPDK/grout

Length of output: 6922


🏁 Script executed:

# Check bridge flood destination paths
sed -n '1,120p' modules/l2/datapath/bridge_flood.c

Repository: DPDK/grout

Length of output: 2701


🏁 Script executed:

# Check iface_output to see if it can route to vxlan_output
sed -n '50,90p' modules/infra/datapath/iface_output.c

Repository: DPDK/grout

Length of output: 921


🏁 Script executed:

# Trace complete flow: does a bridged packet go through vxlan_output?
rg -n --type=c 'iface_output.*=.*vxlan|vxlan.*output' | head -20

Repository: DPDK/grout

Length of output: 1015


🏁 Script executed:

# Verify bridge_input reads vtep from port_rx packets
sed -n '40,75p' modules/l2/datapath/bridge_input.c

Repository: DPDK/grout

Length of output: 1087


🏁 Script executed:

# Check gr_mbuf_copy to confirm what it copies
sed -n '76,83p' modules/infra/datapath/gr_mbuf.h

Repository: DPDK/grout

Length of output: 423


🏁 Script executed:

# Verify port_rx doesn't initialize vtep - all rx functions
rg -n --type=c 'static uint16_t.*_process.*rx|^(uint16_t|static).*port_rx' -A 35 | grep -A 35 'iface_mbuf_data\|vtep'

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Check if there are any other ways to reach vxlan_output besides vxlan_flood
rg -n --type=c 'vxlan_output|iface_output_type_register' -B 5 | grep -A 5 'vxlan_output'

Repository: DPDK/grout

Length of output: 2677


🏁 Script executed:

# Verify that vxlan_flood is the ONLY input to vxlan_output
rg -n --type=c 'rte_node.*vxlan_output\|vxlan_output.*INPUT\|OUTPUT.*vxlan_output' -C 3

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Check all callers of gr_mbuf_copy to see what they copy
rg -n --type=c 'gr_mbuf_copy' -B 2 -A 2

Repository: DPDK/grout

Length of output: 1878


🏁 Script executed:

# Critical: check if cloned packets in bridge_flood have vtep initialized
sed -n '30,50p' modules/l2/datapath/bridge_flood.c

Repository: DPDK/grout

Length of output: 475


🏁 Script executed:

# Check the size of struct mbuf_data vs iface_mbuf_data
rg -n --type=c 'struct mbuf_data\|struct iface_mbuf_data' modules/infra/datapath/gr_rxtx.h

Repository: DPDK/grout

Length of output: 36


🏁 Script executed:

# Check if packets can reach vxlan_output via IP output (not just via vxlan_flood)
rg -n --type=c 'ip_output|iface_output' modules/infra/datapath/iface_output.c -A 5 | head -50

Repository: DPDK/grout

Length of output: 1638


🏁 Script executed:

# Verify the complete flow: can packets reach vxlan_output with uninitialized vtep?
# Check all places that route to vxlan_output
rg -n --type=c '"vxlan_output"' -B 10 -A 2

Repository: DPDK/grout

Length of output: 1020


🏁 Script executed:

# Check if there's any initialization of vtep during packet creation or allocation
rg -n --type=c 'vtep.*=' -B 3 | grep -B 3 -A 1 'port_rx\|allocation\|alloc\|memset'

Repository: DPDK/grout

Length of output: 312


🏁 Script executed:

# Critical: what happens when a regular IP packet (vlan_id=0, vtep=garbage) is output to a VXLAN interface?
sed -n '36,90p' modules/l2/datapath/vxlan_output.c

Repository: DPDK/grout

Length of output: 1320


Initialize iface_mbuf_data.vtep when creating packets.

Packets without explicit vtep initialization (e.g., control plane packets in ctlplane.c:209-210) can reach vxlan_output and use uninitialized vtep as a routing lookup key and encapsulation destination address, resulting in packet misdirection. Ensure all packet creation paths initialize vtep explicitly, or zero the iface_mbuf_data structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/datapath/gr_rxtx.h` around lines 44 - 47, Packet metadata field
iface_mbuf_data.vtep is left uninitialized in some packet creation paths and can
be used by vxlan_output for routing/encapsulation; fix by explicitly
initializing iface_mbuf_data.vtep (or zeroing the entire iface_mbuf_data struct)
whenever an mbuf is allocated or when control-plane packets are constructed
(e.g., the code paths that call the mbuf allocation / packet creation functions
referenced around ctlplane creation), and ensure any helper that fills
iface_mbuf_data (or the mbuf initializer) sets vtep = IPADDR_ANY/0 before
handing the packet to vxlan_output.

Comment on lines +46 to +97
static uint64_t parse_vxlan_args(
struct gr_api_client *c,
const struct ec_pnode *p,
struct gr_iface *iface,
bool update
) {
struct gr_iface_info_vxlan *vxlan;
uint64_t set_attrs;

set_attrs = parse_iface_args(c, p, iface, sizeof(*vxlan), update);

vxlan = (struct gr_iface_info_vxlan *)iface->info;

if (arg_u32(p, "VNI", &vxlan->vni) < 0) {
if (errno != ENOENT)
return 0;
} else {
set_attrs |= GR_VXLAN_SET_VNI;
}

if (arg_ip4(p, "LOCAL", &vxlan->local) < 0) {
if (errno != ENOENT)
return 0;
} else {
set_attrs |= GR_VXLAN_SET_LOCAL;
}

if (arg_str(p, "ENCAP_VRF") != NULL) {
if (arg_vrf(c, p, "ENCAP_VRF", &vxlan->encap_vrf_id) < 0)
return 0;
else
set_attrs |= GR_VXLAN_SET_ENCAP_VRF;
}

if (arg_u16(p, "DST_PORT", &vxlan->dst_port) < 0) {
if (errno != ENOENT)
return 0;
} else {
set_attrs |= GR_VXLAN_SET_DST_PORT;
}

if (arg_eth_addr(p, "MAC", &vxlan->mac) < 0) {
if (errno != ENOENT)
return 0;
} else {
set_attrs |= GR_VXLAN_SET_MAC;
}

if (set_attrs == 0)
errno = EINVAL;
return set_attrs;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parse_iface_args error not short-circuited before VXLAN arg parsing

If parse_iface_args returns 0 due to a real error (e.g., ENAMETOOLONG, ENODEV on update), parse_vxlan_args continues to parse VXLAN-specific attributes. If at least one VXLAN arg is present, the function returns a non-zero set_attrs — silently discarding the base-parse failure. The stale errno may also get overwritten at line 94–95.

The resulting request is sent to the server with a zeroed/invalid iface.name (add) or unresolved iface.id (set), forcing an unnecessary round-trip and potentially a confusing error response rather than an early client-side error.

🛡️ Suggested guard
 	set_attrs = parse_iface_args(c, p, iface, sizeof(*vxlan), update);
+	if (set_attrs == 0 && errno != ENOENT)
+		return 0;
 
 	vxlan = (struct gr_iface_info_vxlan *)iface->info;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/l2/cli/vxlan.c` around lines 46 - 97, parse_vxlan_args currently
continues parsing VXLAN fields even if parse_iface_args failed (returned 0) and
later args can overwrite errno; after calling parse_iface_args (in
parse_vxlan_args, where set_attrs is assigned), immediately check if set_attrs
== 0 and errno != ENOENT and if so return 0 to short-circuit and preserve the
original error, preventing further VXLAN parsing and errno clobbering.

Comment on lines +77 to +81
rte_hash_del_key(vxlan_hash, &cur_key);

ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface);
if (ret < 0)
return errno_log(-ret, "rte_hash_add_key_data");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: old hash entry is deleted before the new one is confirmed, risking orphaned state on failure.

If rte_hash_add_key_data at line 79 fails (e.g., table full), the old key was already removed at line 77. The VXLAN interface becomes unreachable via vxlan_get_iface with no recovery path.

Add the new key first, then delete the old one:

Proposed fix
-		rte_hash_del_key(vxlan_hash, &cur_key);
-
 		ret = rte_hash_add_key_data(vxlan_hash, &next_key, iface);
 		if (ret < 0)
 			return errno_log(-ret, "rte_hash_add_key_data");
 
+		rte_hash_del_key(vxlan_hash, &cur_key);
+
 		cur->vni = next->vni;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/l2/control/vxlan.c` around lines 77 - 81, The code currently deletes
the old hash entry (rte_hash_del_key with cur_key) before adding the new one,
which can orphan the VXLAN interface if rte_hash_add_key_data(vxlan_hash,
&next_key, iface) fails; change the sequence in the code that updates keys so
you call rte_hash_add_key_data(vxlan_hash, &next_key, iface) first and check its
return value, and only if it succeeds then call rte_hash_del_key(vxlan_hash,
&cur_key); ensure you reference the same iface pointer and preserve error
handling (return errno_log(-ret, "rte_hash_add_key_data") on add failure) and
keep vxlan_get_iface behavior intact.

Copy link
Collaborator

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using rte_graph xstats would be a better direction here. What do you think?

https://doc.dpdk.org/api/structrte__node__xstats.html

Comment on lines +38 to +40
printf(" no_fdb: %lu\n", stats->no_fdb_drop);
printf(" hairpin: %lu\n", stats->hairpin_drop);
printf(" iface_down: %lu\n", stats->iface_down_drop);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem duplicate with software stats (we have nodes to represent such drops)

Comment on lines +46 to +60
extern struct bridge_stats l2_bridge_stats[L2_MAX_BRIDGES][RTE_MAX_LCORE];
extern struct fdb_stats l2_fdb_stats[L2_MAX_BRIDGES][RTE_MAX_LCORE];

static inline struct bridge_stats *bridge_get_stats(uint16_t bridge_id, uint16_t lcore_id) {
if (bridge_id >= L2_MAX_BRIDGES)
return NULL;
return &l2_bridge_stats[bridge_id][lcore_id];
}

static inline struct fdb_stats *fdb_get_stats(uint16_t bridge_id, uint16_t lcore_id) {
if (bridge_id >= L2_MAX_BRIDGES)
return NULL;
return &l2_fdb_stats[bridge_id][lcore_id];
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this patch probably lacks some datapath code to actually increment the statistics?

@fabiendupont
Copy link
Author

I wonder if using rte_graph xstats would be a better direction here. What do you think?

https://doc.dpdk.org/api/structrte__node__xstats.html

Good suggestion, I looked into rte_node_xstats. The main trade-off is granularity:

xstats are per-node, not per-bridge. Since bridge_input is a single shared node processing all bridges, the counters would be global aggregates. With multiple bridges, you couldn't tell which bridge is seeing hairpin drops or FDB misses.

If per-bridge breakdown isn't needed (and the generic iface_stats already cover per-interface tx/rx), then xstats would be simpler and integrate with the existing graph stats infrastructure without adding parallel arrays.

If per-bridge granularity matters, the custom [bridge_id][lcore_id] indexing is needed because xstats don't have a way to partition by a runtime key.

What's your preference? I can rework to xstats if global counters are sufficient.

@rjarry
Copy link
Collaborator

rjarry commented Feb 25, 2026

xstats are per-node, not per-bridge. Since bridge_input is a single shared node processing all bridges, the counters would be global aggregates. With multiple bridges, you couldn't tell which bridge is seeing hairpin drops or FDB misses.

Good point. I don't really know if which is more important.

I am tempted to go towards xstats but it will require some base framework changes to allow reporting the stats without any node "name".

Perhaps you could store these as bridge/fdb specific counters for now and expose them under stats show fdb in the CLI. We can always revisit this later and replace it (or parts of it) with xstats.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants