-
Notifications
You must be signed in to change notification settings - Fork 804
Description
Hi all, I'd like to suggest some changes to the behaviour around NLM_F_DUMP_INTR and ErrDumpInterrupted.
I read up on the discussion in #1018 on whether or not to make the library automatically retry dumps when the kernel specifies NLM_F_DUMP_INTR, and I have some data to add.
When 1.3.1 came out, we (Cilium) started seeing ErrDumpInterrupted in every CI run, so we wrapped the library and its whole API surface in a package with a retry mechanism: https://github.com/cilium/cilium/tree/main/pkg/datapath/linux/safenetlink. Looking at the linked PRs in #1018, we were clearly not alone in this, this was a breaking change that took people by surprise. We haven't seen any failures since introducing the wrapper package. It's set to 30 retries, which is generous, but I think we could get away with less, though I have no data to back this up.
Now, we want to start using netlink.Handle more since it's nice when working with network namespaces in tests. This means we'd have to make our own safenetlink.Handle and wrap all of its methods as well, and maintain this thing for eternity. There must be a better way, and I'd obviously much rather not have a wrapper package at all.
I won't lament the decision of introducing ErrDumpInterrupted and letting the caller deal with it; I'm guilty of this exact mistake in ebpf-go and ended up walking back my stance a few years later after many user complaints, but let me challenge one argument in the PR in particular: the author argues that a dump with NLM_F_DUMP_INTR set could still contain useful data. But how do you know if it's the right data? The kernel just signaled that its source of truth has changed, so why trust it at all (aside from Go's "ignore first return value if error" convention)? As a simple example: if an interface gets recreated with the same name during the dump, the caller would end up using the old ifindex in follow-up calls if they ignore DUMP_INTR. Retrying until the dataset has settled would be the smart thing to do in almost every case. Not that it ever rules out ToCToU issues completely, but the chances of the data being correct is higher.
As a middle ground, I'd like to propose adding a bounded (10? 24?) retry loop to NetlinkRequest.ExecuteIter since that seems to be what all socket accesses get funneled through, but please correct me if I'm wrong. We could put a retry label before if s == nil { ... getNetlinkSocket() ... } and jump to it from the if dumpIntr { ... } case at the end of the function. Only if the retry limit is hit, ErrDumpInterrupted is returned with some added context that the retries were exhausted. If the label approach isn't acceptable, we could refactor things a bit to make it fit better, but not sure if I want to take that on at the moment.
Thanks!