Add VXLAN/EVPN support with flood list management#504
Add VXLAN/EVPN support with flood list management#504
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request adds comprehensive L2 switching capabilities including bridge and VXLAN interface types. New infrastructure includes: a dedicated L2 module with bridge management, FDB learning/aging, and flood VTEP tracking; datapath nodes for bridge ingress classification and output flooding; VXLAN encapsulation/decapsulation; integration with FRR zebra for MAC learning notifications; CLI commands for bridge, FDB, and flood management; a control queue draining mechanism for deferred event notification; and extended event types for FDB and flood operations. UDP port aliasing for VXLAN was added to the L4 layer. Supporting DPDK patches address RCU hash safety during overwrites and defer queue failures. Smoke tests validate bridge-based L2 forwarding and EVPN/VXLAN overlays. Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@modules/infra/control/group_nexthop.c`:
- Line 152: The call to rte_rcu_qsbr_synchronize(gr_datapath_rcu(),
rte_lcore_id()) is using rte_lcore_id() from a control thread that is not
registered as a QSBR reader; replace the second argument with
RTE_QSBR_THRID_INVALID so the call becomes
rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID) whenever
invoked from control-plane threads (same change for any other control-plane
calls that pass rte_lcore_id()); ensure only datapath reader threads keep using
their registered thread IDs (registration happens via
rte_rcu_qsbr_thread_register in the datapath main loop).
In `@modules/l2/cli/vxlan.c`:
- Around line 73-77: arg_vrf currently returns 0 when the user omits the
ENCAV_VRF argument, but the code treats 0 as success and unconditionally sets
GR_VXLAN_SET_ENCAP_VRF, causing encap_vrf to be overwritten; fix by storing the
arg_vrf return value (e.g. int ret = arg_vrf(c, p, "ENCAP_VRF",
&vxlan->encap_vrf_id)), return on ret < 0, and only set set_attrs |=
GR_VXLAN_SET_ENCAP_VRF when ret > 0 (meaning the user actually supplied
ENCAV_VRF), leaving vxlan->encap_vrf_id untouched when the argument is absent.
In `@modules/l2/control/bridge.c`:
- Around line 60-77: bridge_detach_member currently resets member->mode to
GR_IFACE_MODE_VRF but leaves member->vrf_id as GR_VRF_ID_UNDEF; update
bridge_detach_member to restore the member's VRF by calling
vrf_default_get_or_create() and assigning the returned vrf id to member->vrf_id
and incrementing its refcount via vrf_incref (mirroring bridge_fini behavior),
then set member->mode = GR_IFACE_MODE_VRF so the detached iface has a valid VRF.
In `@modules/l2/control/vxlan.c`:
- Around line 281-287: The vtep_flood_del function mutates the shared
flood_vteps array in-place (swap-and-decrement) without RCU protection, causing
a data-race with datapath readers; change vtep_flood_del to follow the
copy-on-write + RCU pattern used by vtep_flood_add: allocate a new flood_vteps
buffer, copy entries from the old array excluding entry->vtep.addr (preserving
order if add does), set the new pointer and updated n_flood_vteps atomically
(using the same RCU/atomic swap helper used by vtep_flood_add), schedule the old
buffer to be freed after the RCU grace period, and keep the
gr_event_push(GR_EVENT_FLOOD_DEL, entry) call; reference vtep_flood_del,
vtep_flood_add, flood_vteps, n_flood_vteps, and gr_event_push when making the
change.
- Around line 50-83: The delete uses cur->encap_vrf_id after it was overwritten,
so rte_hash_del_key is built with the new encap_vrf_id instead of the old one;
fix by capturing the old encap_vrf_id (and old vni if needed) before mutating
cur (e.g., read old_vrf = cur->encap_vrf_id and build cur_key from old_vrf and
cur->vni) or postpone assigning cur->encap_vrf_id until after the hash
delete/add sequence; update the code around cur->encap_vrf_id, cur_key,
rte_hash_del_key, next_key and rte_hash_add_key_data accordingly so the deletion
targets the original {old_vni, old_vrf}.
In `@modules/l2/datapath/vxlan_output.c`:
- Around line 75-79: vxlan_output currently assigns ip_output_mbuf_data(m)->nh =
fib4_lookup(...) without checking for NULL and always sends packets to
IP_OUTPUT; change vxlan_output to check the result of fib4_lookup (the value
stored in ip_output_mbuf_data(m)->nh) and if it is NULL enqueue the packet to
the BAD_NEXTHOP edge (the declared but unused BAD_NEXTHOP path) instead of
forwarding to IP_OUTPUT, otherwise continue to set edge = IP_OUTPUT and enqueue
as before; update the enqueue logic around rte_node_enqueue_x1(graph, node,
edge, m) so the chosen edge reflects this NULL-check.
🧹 Nitpick comments (3)
frr/if_grout.c (1)
369-378: Variableaddshadows outerbool addon line 356.
struct gr_fdb_add_req *add(line 370) shadows thebool adddeclared at line 356. This works correctly due to block scoping, but it's a latent maintenance trap — a future refactor could easily reference the wrongadd.Proposed fix — rename inner variable
if (add) { - struct gr_fdb_add_req *add = req; - add->exist_ok = true; - add->fdb.iface_id = ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)); - add->fdb.bridge_id = ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)); - add->fdb.vlan_id = dplane_ctx_mac_get_vlan(ctx); - add->fdb.flags = dplane_ctx_mac_get_dp_static(ctx) ? GR_FDB_F_STATIC : 0; - memcpy(&add->fdb.mac, dplane_ctx_mac_get_addr(ctx), sizeof(add->fdb.mac)); - add->fdb.vtep = dplane_ctx_mac_get_vtep_ip(ctx)->s_addr; + struct gr_fdb_add_req *add_req = req; + add_req->exist_ok = true; + add_req->fdb.iface_id = ifindex_frr_to_grout(dplane_ctx_get_ifindex(ctx)); + add_req->fdb.bridge_id = ifindex_frr_to_grout(dplane_ctx_mac_get_br_ifindex(ctx)); + add_req->fdb.vlan_id = dplane_ctx_mac_get_vlan(ctx); + add_req->fdb.flags = dplane_ctx_mac_get_dp_static(ctx) ? GR_FDB_F_STATIC : 0; + memcpy(&add_req->fdb.mac, dplane_ctx_mac_get_addr(ctx), sizeof(add_req->fdb.mac)); + add_req->fdb.vtep = dplane_ctx_mac_get_vtep_ip(ctx)->s_addr; req_type = GR_FDB_ADD;modules/l2/api/gr_l2.h (1)
44-49: Bit 36 skipped in VXLAN reconfiguration flags.
GR_VXLAN_SET_LOCALis bit 35,GR_VXLAN_SET_MACjumps to bit 37. Bit 36 is unused. If intentional (reserved for a future attribute), no problem. If a typo, it won't cause a bug now but could cause confusion later.modules/l2/control/fdb.c (1)
329-346: Redundantfdb_max_entriesassignment.Line 342 sets
fdb_max_entries = req->max_entries, butfdb_reconfig(line 79) already does the same assignment. Harmless, but the duplicate write could be removed.
ca74f20 to
34d418a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main/control_queue.c (1)
24-24:⚠️ Potential issue | 🟡 MinorMissing parentheses around macro definition.
CONTROL_QUEUE_SIZEexpands unsafely in expressions due to operator precedence. While current usages happen to be fine, this is a latent bug waiting to bite.Proposed fix
-#define CONTROL_QUEUE_SIZE RTE_GRAPH_BURST_SIZE * 4 +#define CONTROL_QUEUE_SIZE (RTE_GRAPH_BURST_SIZE * 4)
🤖 Fix all issues with AI agents
In `@frr/if_grout.c`:
- Around line 369-378: The local pointer declaration inside the if (add) block
shadows the outer bool add; change the pointer name (e.g., rename "struct
gr_fdb_add_req *add = req;" to "struct gr_fdb_add_req *add_req = req;" or
similar) and update all references in that block (fields like add->exist_ok,
add->fdb.* and any further uses) to the new identifier to avoid shadowing and
future fragility in the function that contains the if (add) check.
In `@main/event.c`:
- Around line 42-44: control_queue_push failures currently drop events silently;
update the error path in the block where control_queue_push(notify_subscribers,
(void *)obj, ev_type) < 0 to record and surface the failure: increment a
persistent error metric/counter (e.g., control_events_dropped) and emit a single
WARNING log on first occurrence (or throttled warnings thereafter) that includes
ev_type and a pointer/identifier for obj so operators can detect lost events;
ensure the new metric and warning are used wherever notify_subscribers events
are pushed so dropped events are observable.
In `@modules/l2/control/fdb.c`:
- Around line 24-37: The fdb hash is created without the lock-free concurrency
extra flags, which allows concurrent writers from datapath lcores and control
plane to corrupt the table; in fdb_reconfig set the rte_hash_parameters
extra_flags to include RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF |
RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT before calling rte_hash_create (same
pattern used by vxlan_hash) so rte_hash_create will enable lock-free RW
concurrency for fdb_hash.
In `@modules/l4/l4_input_local.c`:
- Around line 40-50: l4_input_alias_port currently overwrites udp_edges[alias]
and increments udp_refcounts even if alias already points to a different edge;
change the function to first check udp_edges[alias] and only allow aliasing when
the slot is unused or already points to the same edge as udp_edges[port].
Concretely, in l4_input_alias_port check if udp_edges[alias] != UNUSED &&
udp_edges[alias] != udp_edges[port] and return an error (e.g.,
errno_set(EADDRINUSE)) to avoid clobbering another edge; if the alias slot is
unused, set udp_edges[alias] = udp_edges[port] and increment
udp_refcounts[alias]; if it already equals the same edge, treat as idempotent
(optionally increment refcount or leave as-is per existing refcount semantics)
so l4_input_unalias_port can correctly restore state. Ensure the MANAGEMENT
check on the source port remains.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@modules/l2/cli/fdb.c`:
- Around line 118-126: The build fails because scols_* functions (used in
scols_new_table, scols_table_new_column, scols_table_set_column_separator, etc.)
are undeclared when NEED_SCOLS_LINE_SPRINTF is defined; to fix, ensure
<libsmartcols.h> is always included so those declarations are available—either
modify gr_table.h to include <libsmartcols.h> unconditionally (remove the
conditional that skips it when NEED_SCOLS_LINE_SPRINTF is defined) or add a
direct `#include` <libsmartcols.h> at the top of the affected CLI files (e.g.,
modules/l2/cli/fdb.c) so scols_new_table, scols_table_new_column,
scols_table_set_column_separator, and related symbols are declared.
In `@modules/l2/cli/vxlan.c`:
- Around line 73-78: The code sets GR_VXLAN_SET_ENCAP_VRF even when ENCAP_VRF is
omitted because the else is paired with the combined condition; fix by only
setting set_attrs when arg_str(p, "ENCAP_VRF") is present and arg_vrf succeeds:
change the logic so you first check if arg_str(p, "ENCAP_VRF") != NULL, then
call arg_vrf(c, p, "ENCAP_VRF", &vxlan->encap_vrf_id) and return 0 on failure,
and only after a successful arg_vrf call set GR_VXLAN_SET_ENCAP_VRF on set_attrs
(referencing arg_str, arg_vrf, vxlan->encap_vrf_id, and GR_VXLAN_SET_ENCAP_VRF).
In `@modules/l2/control/flood.c`:
- Around line 70-85: In flood_list, don't rely on errno after calling ops->list;
instead capture the return value (e.g. int ret = ops->list(...)), check if ret <
0 and propagate the error consistently (return api_out(-ret, 0, NULL)) like
flood_add/flood_del do; update the loop in flood_list to use the ret variable
and return -ret when ops->list fails or document that ops->list must set errno
if you prefer that contract.
- Around line 30-42: Validate the untrusted index before indexing the
flood_types array: in flood_add (and similarly in flood_del) check that
req->entry.type is >= 0 and < ARRAY_DIM(flood_types) and return
api_out(EAFNOSUPPORT, 0, NULL) if it is out of range; do this before reading
flood_types[req->entry.type] and before accessing ops->add/ops->del so you avoid
any out-of-bounds access.
In `@modules/l2/control/vxlan.c`:
- Around line 250-261: The pointer/count update is racy on weakly-ordered CPUs:
after allocating and filling vteps you assign vxlan->flood_vteps = vteps and
then increment vxlan->n_flood_vteps, which can be seen by readers out-of-order
on ARM; insert a release memory barrier between those two stores by calling
rte_atomic_thread_fence(rte_memory_order_release) immediately after setting
vxlan->flood_vteps = vteps and before doing vxlan->n_flood_vteps++ so readers
cannot observe the new count without seeing the new pointer (keep the existing
rte_rcu_qsbr_synchronize and rte_free usage unchanged).
In `@modules/l2/datapath/vxlan_input.c`:
- Around line 79-85: rte_pktmbuf_adj can return NULL on insufficient headroom
but the result is ignored; modify the logic around rte_pktmbuf_adj in
vxlan_input.c to check its return value and, when NULL, set edge = NO_HEADROOM
(or jump to the existing NO_HEADROOM handling) instead of proceeding to use
iface_mbuf_data and IFACE_INPUT; ensure you do not call iface_mbuf_data or touch
the mbuf after a failed rte_pktmbuf_adj and keep existing fields (vlan_id, vtep,
iface) assignment only on success.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frr/zebra_dplane_grout.c (1)
219-225:⚠️ Potential issue | 🟠 MajorAdd
GR_IFACE_TYPE_BRIDGEandGR_IFACE_TYPE_VXLANto startup interface sync.The
types[]array ingrout_sync_ifaces()omits bridge and VXLAN interfaces. When FRR starts or reconnects after these interfaces already exist in GROUT, FRR won't discover them during the initial startup sync—only runtime events will inform FRR of their existence. Since bridges and VXLAN tunnels are typically created before FRR starts, this gap causes FRR to miss them entirely until the next modification event.Both types are fully handled by
grout_link_change()and simply need to be added to the sync list. Suggest adding BRIDGE before VXLAN (since VXLAN may depend on bridge) and both before or alongside VLAN:static const gr_iface_type_t types[] = { GR_IFACE_TYPE_VRF, + GR_IFACE_TYPE_BRIDGE, GR_IFACE_TYPE_BOND, GR_IFACE_TYPE_IPIP, GR_IFACE_TYPE_PORT, + GR_IFACE_TYPE_VXLAN, GR_IFACE_TYPE_VLAN, };
🤖 Fix all issues with AI agents
In `@frr/if_grout.c`:
- Around line 327-353: grout_fdb_change is missing setting the bridge ifindex on
the dplane context, so the handler grout_add_del_mac later reads an
uninitialized bridge via dplane_ctx_mac_get_br_ifindex(ctx); fix this by calling
dplane_ctx_mac_set_br_ifindex(ctx, ifindex_grout_to_frr(fdb->bridge_id)) (using
the existing ifindex_grout_to_frr helper) before enqueueing the context in
dplane_provider_enqueue_to_zebra(ctx) so the bridge_id is correctly propagated.
In `@modules/l2/control/vxlan.c`:
- Around line 67-83: The block mutates VRF refcounts and cur->encap_vrf_id then
deletes the old vxlan_hash key, but returns on later failures (EADDRINUSE,
ERANGE, or rte_hash_add_key_data) without rolling back, leaking refcounts and
removing the only hash entry; to fix, perform all validations first (check
next->vni range and rte_hash_lookup for next_key) before touching VRF refcounts
or calling rte_hash_del_key, or if you must change state early, add rollback
paths that on any subsequent error: restore the original cur->encap_vrf_id,
decrement the new VRF and increment the old VRF accordingly, and re-add the old
vxlan_hash entry (using rte_hash_add_key_data) before returning; key symbols:
GR_VXLAN_SET_ENCAP_VRF, cur->encap_vrf_id, vxlan_hash, rte_hash_del_key,
rte_hash_add_key_data, rte_hash_lookup.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@modules/l2/control/gr_l2_control.h`:
- Around line 73-87: The VNI lookup mismatch on little-endian systems is caused
by using different encodings: insertion uses vxlan_encode_vni() but control-path
lookups use rte_cpu_to_be_32(), producing different keys; fix by changing the
control-path lookup to use vxlan_encode_vni() (or consistently use
vxlan_encode_vni()/vxlan_decode_vni() everywhere) so keys match insertion/lookup
semantics—update the lookup sites that currently call rte_cpu_to_be_32() to call
vxlan_encode_vni() (or refactor insertion to use rte_cpu_to_be_32() if you
prefer the other convention), ensuring the same encoding function is used in
vxlan_encode_vni, vxlan_decode_vni, hash insertion, and control-path lookup
functions.
In `@modules/l2/control/vxlan.c`:
- Around line 86-100: The code currently ignores return values from
l4_input_unalias_port and l4_input_alias_port inside the GR_VXLAN_SET_DST_PORT
branch; update the logic to check their returns and handle failures: call
l4_input_unalias_port and if it fails log the error and abort updating
cur->dst_port (or return the error), then call l4_input_alias_port and if it
fails revert any prior unalias/alias changes (restore previous alias state), log
the error and return a failure code instead of setting cur->dst_port;
specifically modify the block handling set_attrs & GR_VXLAN_SET_DST_PORT around
variables next->dst_port and cur->dst_port to only assign cur->dst_port after
successful l4_input_alias_port / l4_input_unalias_port calls and propagate the
error (handle EADDRNOTAVAIL/EADDRINUSE) to the caller.
🧹 Nitpick comments (1)
modules/l2/control/fdb.c (1)
339-347: Redundant assignment offdb_max_entries.
fdb_reconfig()already setsfdb_max_entries = max_entriesat line 81. Line 346 sets it again. Not a bug, but worth noting.
a62fe2b to
cd21aba
Compare
fde2085 to
86aa07b
Compare
Introduce the VXLAN interface type for the L2 module. A VXLAN interface carries a VNI (VXLAN Network Identifier), a local VTEP address used as the outer IP source, an encapsulation VRF for underlay routing, and a configurable UDP destination port (default 4789). VXLAN interfaces are keyed by (VNI, encap_vrf_id) in a lockfree RCU-protected hash table so that the datapath can resolve incoming tunneled packets to the correct interface without locks. VXLAN interfaces are intended to be attached to a bridge domain. All L2 traffic entering the bridge is forwarded transparently over the VXLAN tunnel. The local VTEP address must already be configured in the encapsulation VRF. Signed-off-by: Robin Jarry <rjarry@redhat.com>
VXLAN uses UDP port 4789 by default but allows configuring a custom destination port per interface. Allow the control plane to register additional UDP ports at runtime as aliases for an already registered port, reusing the same datapath edge. Use reference counting so that multiple interfaces sharing the same non-default port do not interfere with each other during teardown. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Wire up the VXLAN interface's configurable destination port to the L4 input node. When a non-default port is configured, register it as an alias for the standard VXLAN port (4789) so that the datapath delivers matching UDP packets to the vxlan_input node. Unregister the alias when the port changes or the interface is destroyed. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Introduce a transport-agnostic flood list framework for BUM traffic (Broadcast, Unknown unicast, Multicast). In EVPN, each PE maintains a flooding list built from IMET routes (RFC 8365, RFC 9572). The entries in this list differ depending on the overlay encapsulation: VXLAN uses a remote VTEP IPv4 address and a VNI, while SRv6 would use a 128-bit SID. The API defines a gr_flood_entry structure with a type discriminant and a union, allowing future encapsulation types (e.g. SRv6 SIDs) to be added without changing the API request types. A dispatch layer in control/flood.c routes add/del/list operations to type-specific callbacks registered at init time. Implement the VXLAN VTEP flood type (GR_FLOOD_T_VTEP). Each VXLAN interface maintains a per-VNI array of remote VTEP addresses used by the vxlan_flood datapath node for ingress replication. The array is replaced atomically with an RCU synchronization barrier so that the datapath never sees a partially updated list. CLI commands are exposed under "flood vtep add/del/show". Add new generated grcli-flood(1) man page. Signed-off-by: Robin Jarry <rjarry@redhat.com>
In a VXLAN overlay, the bridge needs to know which remote VTEP to use when sending unicast frames to a learned MAC address. Add a VTEP IPv4 address field to FDB entries so that known unicast traffic can be sent directly to the correct tunnel endpoint instead of being flooded to all VTEPs. When bridge_input learns a MAC address from a VXLAN member interface, it records the source VTEP from the decapsulated packet's outer IP header. When forwarding to a known destination, the stored VTEP address is passed to the output path via the mbuf private data so that vxlan_output can build the correct outer header. Only set the VTEP field when the source interface is actually a VXLAN type to avoid storing uninitialized data from other packet paths (control plane, local bridge traffic). Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add three datapath nodes for VXLAN packet processing. vxlan_input decapsulates incoming UDP/4789 packets. It strips the outer UDP and VXLAN headers, resolves the inner VNI to a VXLAN interface via the RCU-protected hash table, records the source VTEP from the outer IP header into the mbuf private data, and forwards the inner Ethernet frame to iface_input for bridge processing. vxlan_output encapsulates outgoing frames for a known destination VTEP. It prepends a pre-built IP/UDP/VXLAN header template initialized by the control plane, fills in the per-packet fields (destination VTEP, UDP length, IP length, checksum), and hashes the inner flow to select an ephemeral source port for underlay ECMP (RFC 7348 Section 5). The FIB lookup for the outer IP uses the encapsulation VRF, not the bridge domain. vxlan_flood handles BUM traffic by replicating the frame to every VTEP in the flood list via ingress replication. The original mbuf is sent to the first VTEP and clones are created for the rest. The bridge_flood node is updated to steer VXLAN member traffic through vxlan_flood instead of direct iface_output. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a VXLAN overlay between grout and a Linux netns peer. Grout runs a bridge with a VXLAN member (VNI 100) and the Linux side mirrors the topology with a kernel VXLAN device enslaved to a Linux bridge. Both sides have flood lists configured with each other's VTEP address for BUM traffic replication. The test verifies L3 connectivity over the tunnel by having the Linux side ping the bridge address. This exercises the full path: ARP resolution over VXLAN, FDB learning from decapsulated traffic, and ICMP echo reply via the VXLAN output encapsulation. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report bridge interfaces to FRR as ZEBRA_IF_BRIDGE with their MAC address. Tag members with ZEBRA_IF_SLAVE_BRIDGE and propagate the bridge ifindex so that FRR can associate them with the correct master. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Report VXLAN interfaces to FRR's zebra as ZEBRA_IF_VXLAN with the associated L2 VNI information. This allows FRR's EVPN control plane to discover which VNIs are locally configured and advertise them via BGP IMET routes to remote PEs. The VXLAN L2 info includes the VNI, the local VTEP address, and the underlay interface index so that zebra can correlate the tunnel with the correct underlay routing context. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Synchronize bridge FDB entries bidirectionally between grout and FRR. This is required for EVPN to advertise locally learned MAC addresses via BGP type-2 routes and to install remotely learned MACs into the bridge forwarding table. Zebra's dplane API is asymmetric for MAC/FDB entries. In the downward direction (zebra to dplane provider), zebra uses DPLANE_OP_MAC_INSTALL and DPLANE_OP_MAC_DELETE to push MACs into the dataplane. In the upward direction (dplane provider notifying zebra of learned MACs), DPLANE_OP_NEIGH_INSTALL and DPLANE_OP_NEIGH_DELETE must be used instead. These go through zebra_neigh_macfdb_update() which calls zebra_vxlan_local_mac_add_update() and ultimately triggers BGP EVPN type-2 route advertisement. By contrast, the DPLANE_OP_MAC_* result handler (zebra_vxlan_handle_result) is a no-op. Despite the NEIGH op name, the context payload uses the macinfo union member and is populated with dplane_ctx_mac_set_*() accessors, exactly like zebra's own netlink provider does in netlink_macfdb_change(). Unlike routes and nexthops which use higher-level zebra APIs that resolve the namespace from the VRF ID, the FDB notification path looks up interfaces via if_lookup_by_index_per_ns(ns_id, ifindex). GROUT_NS must therefore be set on the dplane context for the interface lookup to succeed. Function names follow zebra's rt_netlink.c naming conventions: grout_macfdb_change() for the upward notification path (like netlink_macfdb_change) and grout_macfdb_update_ctx() for the downward install path (like netlink_macfdb_update_ctx). Self-event suppression is enabled on the FDB event subscriptions to prevent feedback loops when FRR installs a MAC that was originally learned by grout. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Handle DPLANE_OP_VTEP_ADD and DPLANE_OP_VTEP_DELETE operations from FRR's EVPN control plane. When BGP learns a remote VTEP via an IMET route (EVPN type-3), zebra pushes the VTEP to the dataplane provider. The grout_vxlan_flood_update_ctx() function (named after zebra's netlink_vxlan_flood_update_ctx() in rt_netlink.c) translates these operations into GR_FLOOD_ADD/DEL requests with GR_FLOOD_T_VTEP type. This is a downward-only path: zebra pushes flood list entries to the dplane provider. There is no upward notification for VTEP flood list changes since grout does not learn VTEPs on its own, they are always provided by FRR's BGP EVPN control plane. This allows BGP EVPN to dynamically manage the per-VNI flood lists used for BUM traffic ingress replication, replacing the need for static flood list configuration via the CLI. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Set up a full EVPN/VXLAN topology between FRR+grout and a standalone FRR+Linux peer. Each side runs a bridge with a VXLAN member (VNI 100) and a host namespace. Both peers run iBGP with the l2vpn evpn address-family and advertise-all-vni. The test verifies that EVPN type-3 (IMET) routes are exchanged so that both sides install each other's VTEP in their flood lists. It then verifies end-to-end L2 connectivity by pinging between the two host namespaces through the VXLAN overlay, which exercises type-2 (MAC/IP) route advertisement and FDB synchronization. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add VXLAN interface type with encapsulation/decapsulation datapath nodes. Each VXLAN interface maintains a per-VNI flood list of remote VTEPs used for BUM traffic ingress replication.
The flood list API is transport-agnostic, designed to accommodate future SRv6 EVPN support. VXLAN VTEP is the first registered flood type. A dispatch layer routes add/del/list operations to type-specific callbacks.
FRR integration is wired up for bridge interfaces, VXLAN interfaces, FDB entries and flood lists. This enables BGP EVPN type-2 (MAC/IP) and type-3 (IMET) route exchange with remote PEs.
Also fix interface running state not being set on creation. This prevented FRR from seeing logical interfaces as operationally up.
Summary by CodeRabbit
New Features
Tests